Jeremy Parker Yang
jeremyparkeryang.bsky.social
Jeremy Parker Yang
@jeremyparkeryang.bsky.social
PhD Student @ UCSD
ML/Bio, genomic language models, epigenetics
Reposted by Jeremy Parker Yang
How well can deep learning models predict the effect of modifying chromatin on gene expression???

Our work -- led by Sanjit Batra and Alan Cabrera when they were in @yun-s-song.bsky.social ’s and Isaac Hilton’s labs -- tries to answer this.

🧵🧬🧪

elifesciences.org/reviewed-pre...
Predicting the effect of CRISPR-Cas9-based epigenome editing
elifesciences.org
May 30, 2025 at 2:45 AM
Reposted by Jeremy Parker Yang
Congrats to Nathaniel and Sri for their exciting work teaching protein language models to generate beyond what evolution has explored. They introduce Reinforcement Learning from eXperimental Feedback (RLXF) to steer generation toward enhanced and non-natural functions
www.biorxiv.org/content/10.1...
May 8, 2025 at 6:25 PM
Reposted by Jeremy Parker Yang
Protein language model likelihood are better zero shot mutation effect predictions when they have perplexity 3-6 on the wildtype sequence.

www.biorxiv.org/content/10.1...
April 30, 2025 at 6:18 PM
Reposted by Jeremy Parker Yang
One of the toughest parts of the field of massively parallel reporter assays to measure >~thousands of elements is that there are hundreds of pubs using them, but no central repo to easily locate the results....until now! Great collab w/ Jingjing Zhao, Ilias G-S, and @nadavahituv.bsky.social!!
MPRAbase a Massively Parallel Reporter Assay database
An international, peer-reviewed genome sciences journal featuring outstanding original research that offers novel insights into the biology of all organisms
genome.cshlp.org
April 22, 2025 at 6:52 PM
Reposted by Jeremy Parker Yang
@thejohnnyyu.bsky.social, @therealnima.bsky.social, and I, are excited to tell you about Tahoe-100M! The largest publicly available single-cell dataset that measures the effect of 1200 genes on 50 cell line models. The Vevo team has outdone itself. #Tahoe100M www.biorxiv.org/content/10.1...
Tahoe-100M: A Giga-Scale Single-Cell Perturbation Atlas for Context-Dependent Gene Function and Cellular Modeling
Building predictive models of the cell requires systematically mapping how perturbations reshape each cell's state, function, and behavior. Here, we present Tahoe-100M, a giga-scale single-cell atlas ...
www.biorxiv.org
February 25, 2025 at 1:25 PM
Really enjoyed this creative paper using "DNA prompts" to guide Evo in designing genomic sequences. Curious to see how this research evolves.

www.biorxiv.org/content/10.1...
Semantic mining of functional de novo genes from a genomic language model
Generative genomics models can design increasingly complex biological systems. However, effectively controlling these models to generate novel sequences with desired functions remains a major challeng...
www.biorxiv.org
December 19, 2024 at 3:40 AM
Reposted by Jeremy Parker Yang
Finally out! We present EXTRA-seq, a new EXTended Reporter Assay to quantify endogenous enhancer-promoter communication at kb scale!
www.biorxiv.org/content/10.1...
A 🧵about what it can do:
#SynBio #DeepLearning #GeneRegulation
EXTRA-seq: a genome-integrated extended massively parallel reporter assay to quantify enhancer-promoter communication
Precise control of gene expression is essential for cellular function, but the mechanisms by which enhancers communicate with promoters to coordinate this process are not fully understood. While seque...
biorxiv.org
December 16, 2024 at 2:39 PM
Reposted by Jeremy Parker Yang
Can we bypass the resource bottleneck of pretraining genomic Foundation Models? Our work L2G repurposes language LLMs for genomics via cross-modal transfer, matching fine-tuned genomic FMs. Kudos to Wenduo & fantastic collab w/ @atalwalkar.bsky.social. L2G, language to genome; L2G, life’s too good!
December 11, 2024 at 1:41 PM
Reposted by Jeremy Parker Yang
Even as an interpretable ML researcher, I wasn't sure what to make of Mechanistic Interpretability, which seemed to come out of nowhere not too long ago.

But then I found the paper "Mechanistic?" by
@nsaphra.bsky.social and @sarah-nlp.bsky.social, which clarified things.
November 20, 2024 at 8:00 AM
Reposted by Jeremy Parker Yang
Super excited to share our review on genomic deep learning models for non-coding variant effect prediction, with Ayesha Bajwa and Nilah Ioannidis. We’d like this review to be a useful resource, and welcome any feedback, comments, or questions! 1/4

arxiv.org/abs/2411.11158
Leveraging genomic deep learning models for non-coding variant effect prediction
The majority of genetic variants identified in genome-wide association studies of complex traits are non-coding, and characterizing their function remains an important challenge in human genetics. Gen...
arxiv.org
November 20, 2024 at 1:31 AM