Peter Koo
banner
pkoo562.bsky.social
Peter Koo
@pkoo562.bsky.social
AI4Science researcher. Associate Professor @CSHL. My lab advances AI for genomics and healthcare!

http://koo-lab.github.io
Beware of LLM blindspots. #AI4Science
November 8, 2025 at 9:24 PM
Exciting symposium on AI and Biology at EMBO | EMBL in Heidelberg on 10-13 March 2026!

Excellent lineup of invited speakers across various scales of biology!

Deadline for abstract submission is coming up — Dec 2.

🔗 www.embl.org/about/info/c...

#EESAIBio @EMBLEvents
November 7, 2025 at 12:16 AM
We find regulatory DNA is readily reprogrammable with a few key mutations! We observed similar phenomenon across all genomic DNNs we tested! 12/N
October 9, 2025 at 12:08 PM
And we tested the backgrounds across different ChromBPNet models independently trained on DNase-seq and ATAC-seq and we observe similar backgrounds! This suggests these mutagenesis-robust patterns are important context that reflects properties of the local sequence space. 10/N
October 9, 2025 at 12:08 PM
While previous analyses focused on differences in attr maps across clusters, a surprising observation was that there were also shared patterns. We disentangled the attribution signals that are sensitive versus robust to mutagenesis – we call them foreground and background. 9/N
October 9, 2025 at 12:08 PM
This analysis flagged 2 key mutations at positions 170 & 174 that created a new CAAT box. To test necessity & sufficiency, we mutated each individually and together, then examined attr maps+ predictions
- single mutations -> no change
- double mutation -> CAAT box + new Inr

8/N
October 9, 2025 at 12:08 PM
Applying SEAM to CLIPNET, which predicts transcriptional activity measured via PRO-cap, we find that many SNVs lead to new clusters in the PIK3R3 promoter. A few specific mutations can quantitatively tune gene expression and SEAM can find them! 7/N
October 9, 2025 at 12:08 PM
Now, if we plot the percent mismatch of the nucleotides with respect to WT for each cluster, you can see yellow bars that reflect all sequences in the cluster share the same single nucleotide mutation. This analysis pinpoints the exact mutation that led to the new mechanism! 6/N
October 9, 2025 at 12:08 PM
Low entropy reflects all the sequences share the same nucleotides, while high entropy reflects different mutations destroyed the motif. Sometimes, we see motif preserving signature outside the vertical bands. This represents a de novo motif that appeared within that cluster. 5/N
October 9, 2025 at 12:08 PM
If we calculate the positional entropy of the sequences within each cluster, we get a cluster summary matrix. The vertical bands highlight the locations of the motifs in WT seq and entropy levels indicate whether the motif is present or not in the attr maps in each cluster. 4/N
October 9, 2025 at 12:08 PM
Attr maps can sometimes be easy to interpret, and sometimes they're complex. SEAM's clustered attr maps are cleaner (think SmoothGrad) and they decompose complex mechanisms via partial random mutagenesis, which occasionally disrupts key binding sites. 3/N
October 9, 2025 at 12:08 PM
SEAM is conceptually simple. Starting from a reference sequence:
1) sample in a local region of sequence space via partial random mutagenesis
2) calculate attr maps to unveil the mechanisms
3) cluster attr maps based on shared mechanisms
4) cluster-based sequence analysis

2/N
October 9, 2025 at 12:08 PM
Which mutations rewire function of regulatory DNA?

Excited to share SEAM: Systematic Explanation of Attribtuion-based Mechanisms. SEAM is an explainable AI method that dissects cis-regulatory mechanisms learned by seq2fun genomic deep learning models.

Led by @EESetiz

1/N 🧵👇
October 9, 2025 at 12:03 PM
Congratulations to John Clarke, Michel Devoret and John Martinis on receiving the 2025 Nobel Prize in Physics!
www.nobelprize.org/prizes/physi...

I have fond memories of my time in the Clarke lab, where I did my Honors Thesis on ultra low-field MRI w/ SQUIDs as an undergrad at UC Berkeley!
October 7, 2025 at 2:16 PM
Richard Bonneau giving the last keynote on navigating the complexity of drug discovery and their lab-in-the-loop for molecule design! #MLCB
September 11, 2025 at 5:40 PM
First talk a (surprise) keynote by Jacob Schreiber from UMass Medical talking about fruit-themed AI tools for understanding and designing regulatory DNA
September 11, 2025 at 1:44 PM
Now Barbara Engelhardt giving a keynote on characterizing behaviors of modified T cells in live cell imaging data using machine learning!
September 10, 2025 at 5:58 PM
Next talk by Courtney Shearer who is talking about genomic language models for zero shot promoter indel effects!
September 10, 2025 at 3:16 PM
Next talk by Alan Murphy and Masayuki (Moon) Nagai (from my lab!) who are talking about how naive fine-tuning genomic DNNs leads to catastrophic forgetting and propose *iterative causal refinement* to improve learned associations to causal understanding of cis-regulatory biology!
September 10, 2025 at 2:54 PM
Next talk by Johannes Linder at Calico. Talking about expanding genomic seq2fun DNNs with RBP binding and RNA processing data to consider post-transcriptional regulation.
September 10, 2025 at 2:38 PM
Some technical delays but we are all set!

First talk by Alexis Battle! @alexisbattle.bsky.social
September 10, 2025 at 1:52 PM
Here's another unpublished result:

We compared probing strategies to assess how informative the pretrained representations are—benchmarking Evo2 vs D3 on Drosophila enhancer activity measured via STARR-seq.

Again, D3 outperforms Evo2 (and one-hot) across all probing methods!
July 16, 2025 at 12:17 PM
But, when we trained D3 (score-entropy discrete diffusion for regulatory DNA) in an unsupervised manner on the genomic sequences, probing the representations of D3 was comparable to supervised SOTA (even with a basic CNN)! (100M parameters vs 40B parameters)
July 16, 2025 at 12:17 PM
Building virtual cells is a great goal in the age of AI, but it requires far more than training transformers with scRNAseq.

*Scaling* as the primary strategy with hopes of emergent properties is lazy.

Will the plan to fuse representations across mediocre (unimodal) foundation models work?!
May 28, 2025 at 4:39 PM
We also trained supervised models on synthetic data generated by D3 (trained on just 25% of the DeepSTARR training set).

The result? D3-generated sequences are informative—they improve downstream supervised models, especially when paired with training tricks like EvoAug! (9/n)
May 23, 2025 at 1:55 PM