Lightnews — Scholar-powered news

Reposted by Pooja Kathail

Liana Lareau

@lianafaye.bsky.social

This preprint from Helen Sakharova is one of the coolest things to come out of my lab: “Protein language models reveal evolutionary constraints on synonymous codon choice.” Codon choice is a big puzzle in how information is encoded in genomes, and we have a new angle. www.biorxiv.org/content/10.1...

Protein language models reveal evolutionary constraints on synonymous codon choice

Evolution has shaped the genetic code, with subtle pressures leading to preferences for some synonymous codons over others. Codons are translated at different speeds by the ribosome, imposing constrai...

www.biorxiv.org

August 7, 2025 at 8:29 AM

Reposted by Pooja Kathail

Anshul Kundaje

@anshulkundaje.bsky.social

Congratulations to incoming postdoc @rrastogi.bsky.social for being awarded the Warren Alpert Postdoctoral Scholarship! Look forward to having him join us in soon!

June 9, 2025 at 7:23 PM

Reposted by Pooja Kathail

David A Knowles

@davidaknowles.bsky.social

We had a bunch of requests so we're extending the #MLCB2025 deadline to June 3rd (anywhere on earth)! cmt3.research.microsoft.com/MLCB2025 to submit.

May 31, 2025 at 10:30 PM

Reposted by Pooja Kathail

Sara Mostafavi

@saramostafavi.bsky.social

Some encouraging news for cross-gene generalization of allele effects in S2F models. www.biorxiv.org/content/10.1...

Deep genomic models of allele-specific measurements

Allele-specific quantification of sequencing data, such as gene expression, allows for a causal investigation of how DNA sequence variations influence cis gene regulation. Current methods for analyzin...

www.biorxiv.org

April 16, 2025 at 1:46 AM

Reposted by Pooja Kathail

Sara Mostafavi

@saramostafavi.bsky.social

Our new pre-print, investigating a few important questions when we train S2F models on different types of MPRA datasets. Congrats to Yilun and @xinmingtu.bsky.social www.biorxiv.org/content/10.1...

Investigating Data Size, Sequence Diversity, and Model Complexity in MPRA-based Sequence-to-Function Prediction

We created the MPRA Dataset Collection (MDC), a curated resource of MPRA data from 12 studies comprising over 150 million labeled DNA subsequences. These datasets include both random and natural genom...

www.biorxiv.org

March 15, 2025 at 3:02 AM

Reposted by Pooja Kathail

Jeremy Berg

@jeremymberg.bsky.social

I have confirmation from several sources now that all T32s, many F30s and F31s, and most or all Center awards (P30, P50) have been terminated at Columbia.

This is quite damaging to research and to individuals.

This is pure terrorism and cannot be legal. But litigation will take time...

March 11, 2025 at 2:30 PM

Reposted by Pooja Kathail

David A Knowles

@davidaknowles.bsky.social

Wow. "NIH" canceled my co-mentored (with Dave Sulzer) PhD student's F31 funding. His work is on understanding the genetics and neuroscience of language learning disorders. F31 provides no indirect $ to Columbia, just pays his salary. Not that it should matter, but he's an American citizen. W.T.F.

March 11, 2025 at 12:41 PM

Reposted by Pooja Kathail

Fernando Pérez

@fernandoperez.org

It's today, T-3h! If you're in the East Bay and care about science or education (i.e. if you care about living on this planet in any form 😃), join us, 11:45 at Upper Sproul!

And if you're elsewhere, look up a local event in your area, there's a LOT happening today!

www.standup4scienceberkeley.com

Map of Northern hemisphere with many blue place markers.

March 7, 2025 at 4:43 PM

Reposted by Pooja Kathail

Sara Mostafavi

@saramostafavi.bsky.social

Our new paper describing a scalable approach for training sequence-to-function models on personal genomes ("personal genome training"), includes our observations on when this works and its limitations. www.biorxiv.org/content/10.1...
Congrats: Anna, @xinmingtu.bsky.social , @lxsasse.bsky.social

A scalable approach to investigating sequence-to-expression prediction from personal genomes

A key promise of sequence-to-function (S2F) models is their ability to evaluate arbitrary sequence inputs, providing a robust framework for understanding genotype-phenotype relationships. However, despite strong performance across genomic loci , S2F models struggle with inter-individual variation. Training a model to make genotype-dependent predictions at a single locus-an approach we call personal genome training-offers a potential solution. We introduce SAGE-net, a scalable framework and software package for training and evaluating S2F models using personal genomes. Leveraging its scalability, we conduct extensive experiments on model and training hyperparameters, demonstrating that training on personal genomes improves predictions for held-out individuals. However, the model achieves this by identifying predictive variants rather than learning a cis-regulatory grammar that generalizes across loci. This failure to generalize persists across a range of hyperparameter settings. These findings highlight the need for further exploration to unlock the full potential of S2F models in decoding the regulatory grammar of personal genomes. Scalable software and infrastructure development will be critical to this progress. ### Competing Interest Statement The authors have declared no competing interest.

www.biorxiv.org

February 23, 2025 at 11:31 PM

Reposted by Pooja Kathail

Andrew Marderstein

@amarderstein.bsky.social

New preprint w/ @soumyakundu.bsky.social @sbmontgom.bsky.social @anshulkundaje.bsky.social !

Using deep learning & scATAC-seq, we studied context-specific variants in disease & evolution, and introduce FLARE for de novo mutations—w/ application to autism-affected families.

doi.org/10.1101/2025...

Mapping the regulatory effects of common and rare non-coding variants across cellular and developmental contexts in the brain and heart

Whole genome sequencing has identified over a billion non-coding variants in humans, while GWAS has revealed the non-coding genome as a significant contributor to disease. However, prioritizing causal...

www.biorxiv.org

February 19, 2025 at 1:32 PM

Reposted by Pooja Kathail

Saori Sakaue

@saorisakaue.bsky.social

📣Excited to share my last postdoc paper with
@soumya-boston.bsky.social on eQTL mechanisms depending on where the RNA is in the cell! @broadinstitute.org @harvardmed.bsky.social
TL;DR:Early RNA eQTL variants in the nucleus and late RNA eQTL variants in the cytosol have distinct molecular mechanism🧵

February 27, 2025 at 2:21 AM

Reposted by Pooja Kathail

Peter Koo

@pkoo562.bsky.social

[SAVE THE DATE] MLCB 2025 is happening Sept 10-11 at the NY Genome Center in NYC!

Attend the premier conference at the intersection of ML & Bio, share your research and make lasting connections!

Submission deadline: June 1
More details: mlcb.github.io

Help spread the word—please RT! #MLCB2025

February 5, 2025 at 2:50 AM

Reposted by Pooja Kathail

David A Knowles

@davidaknowles.bsky.social

#MLCB2025 will be Sept 10-11 at @nygenome.org in NYC! Paper deadline June 1st & in-person registration will open in May. Please sign up for our mailing list groups.google.com/g/mlcb/ for future announcements. More details at mlcb.github.io. Please RP!

January 27, 2025 at 6:40 PM

Reposted by Pooja Kathail

Austin Wang

@austintwang.bsky.social

(1/10) Excited to announce our latest work! @arpita-s.bsky.social, @amanpatel100.bsky.social , and I will be presenting DART-Eval, a rigorous suite of evals for DNA Language Models on transcriptional regulatory DNA at #NeurIPS2024. Check it out! arxiv.org/abs/2412.05430

DART-Eval: A Comprehensive DNA Language Model Evaluation Benchmark on Regulatory DNA

Recent advances in self-supervised models for natural language, vision, and protein sequences have inspired the development of large genomic DNA language models (DNALMs). These models aim to learn gen...

arxiv.org

December 11, 2024 at 2:30 AM

Reposted by Pooja Kathail

Amy Lu

@amyxlu.bsky.social

1/🧬 Excited to share PLAID, our new approach for co-generating sequence and all-atom protein structures by sampling from the latent space of ESMFold. This requires only sequences during training, which unlocks more data and annotations:

bit.ly/plaid-proteins
🧵

December 6, 2024 at 5:44 PM

Pooja Kathail

@poojakathail.bsky.social

Super excited to share our review on genomic deep learning models for non-coding variant effect prediction, with Ayesha Bajwa and Nilah Ioannidis. We’d like this review to be a useful resource, and welcome any feedback, comments, or questions! 1/4

arxiv.org/abs/2411.11158

Leveraging genomic deep learning models for non-coding variant effect prediction

The majority of genetic variants identified in genome-wide association studies of complex traits are non-coding, and characterizing their function remains an important challenge in human genetics. Gen...

arxiv.org

November 20, 2024 at 1:31 AM

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news