Javier Mendoza-Revilla
javiermenrev.bsky.social
Javier Mendoza-Revilla
@javiermenrev.bsky.social
Interested in AI in Genomics - Computational Geneticist @InstaDeep
Reposted by Javier Mendoza-Revilla
Our new paper describing a scalable approach for training sequence-to-function models on personal genomes ("personal genome training"), includes our observations on when this works and its limitations. www.biorxiv.org/content/10.1...
Congrats: Anna, @xinmingtu.bsky.social , @lxsasse.bsky.social
A scalable approach to investigating sequence-to-expression prediction from personal genomes
A key promise of sequence-to-function (S2F) models is their ability to evaluate arbitrary sequence inputs, providing a robust framework for understanding genotype-phenotype relationships. However, despite strong performance across genomic loci , S2F models struggle with inter-individual variation. Training a model to make genotype-dependent predictions at a single locus-an approach we call personal genome training-offers a potential solution. We introduce SAGE-net, a scalable framework and software package for training and evaluating S2F models using personal genomes. Leveraging its scalability, we conduct extensive experiments on model and training hyperparameters, demonstrating that training on personal genomes improves predictions for held-out individuals. However, the model achieves this by identifying predictive variants rather than learning a cis-regulatory grammar that generalizes across loci. This failure to generalize persists across a range of hyperparameter settings. These findings highlight the need for further exploration to unlock the full potential of S2F models in decoding the regulatory grammar of personal genomes. Scalable software and infrastructure development will be critical to this progress. ### Competing Interest Statement The authors have declared no competing interest.
www.biorxiv.org
February 23, 2025 at 11:31 PM
Reposted by Javier Mendoza-Revilla
We trained a genomic language model on all observed evolution, which we are calling Evo 2.

The model achieves an unprecedented breadth in capabilities, enabling prediction and design tasks from molecular to genome scale and across all three domains of life.
February 19, 2025 at 4:42 PM
Reposted by Javier Mendoza-Revilla
Selective State Space Models Outperform Transformers at Predicting RNA-Seq Read Coverage https://www.biorxiv.org/content/10.1101/2025.02.13.638190v1
February 18, 2025 at 2:33 AM
Reposted by Javier Mendoza-Revilla
Can DNA sequence models predict mutations affecting human traits?

We introduce TraitGym, a curated benchmark of causal regulatory variants for 113 Mendelian & 83 complex traits, and evaluate functional genomics and DNA language models. Joint work w/ Gökcen Eraslan and @yun-s-song.bsky.social 🧵👇
February 13, 2025 at 8:57 PM