Lightnews — Scholar-powered news

Sarah Gurev

@sarahgurev.bsky.social

98 followers 190 following 13 posts

Postdoc @ Debbie Marks Lab, Harvard | Prev. PhD @ MIT EECS || ML for Proteins + Viruses 🦠

Posts Replies Media Videos

Sarah Gurev

@sarahgurev.bsky.social

Thanks!

August 20, 2025 at 8:50 PM

Sarah Gurev

@sarahgurev.bsky.social

🦠The future of pathogen forecasting needs rigorous benchmarks and domain-specific modeling, not only bigger PLMs. EVEREST is a step in that direction.

🔗Paper: biorxiv.org/content/10.1...
💻Code + data: github.com/debbiemarksl...
12/12

Variant effect prediction with reliability estimation across priority viruses

Viruses pose a significant threat to global health due to their rapid evolution, adaptability, and increasing potential for cross-species transmission. While advances in machine learning and the growi...

biorxiv.org

August 17, 2025 at 3:42 AM

Sarah Gurev

@sarahgurev.bsky.social

🙏Amazing collaboration co-led with Noor Youssef
and Navami Jain, @deboramarks.bsky.social, and our funders @cepi.net!
11/12

August 17, 2025 at 3:42 AM

Sarah Gurev

@sarahgurev.bsky.social

This matters for:
⚠️ Future-proof vaccine and therapeutics design
⚠️ Monitoring of high-pandemic risk viruses
⚠️ Dual-use biosecurity risk assessment

Without reliable models, we risk underestimating viral evolution—and overestimating our ability to counter it.
10/12

August 17, 2025 at 3:42 AM

Sarah Gurev

@sarahgurev.bsky.social

EVEREST highlights:
✅ Where models fail—and why
✅ Which viruses are least/most predictable
✅ How to estimate per-protein, model-specific reliability
✅ Concrete steps to improve ML for viral mutation prediction
9/12

August 17, 2025 at 3:42 AM

Sarah Gurev

@sarahgurev.bsky.social

🌍Current models fail to reliably predict mutations in more than half of the high-priority viruses identified by the WHO.
8/12

August 17, 2025 at 3:42 AM

Sarah Gurev

@sarahgurev.bsky.social

💪Is bigger always better? Maybe not for other taxa but for viruses - yes! For viruses, models continue to improve with increased numbers of parameters.
7/12

August 17, 2025 at 3:42 AM

Sarah Gurev

@sarahgurev.bsky.social

🤏Why? Viruses are severely underrepresented in training datasets (<1%) and are further downsampled after common clustering approaches.
6/12

August 17, 2025 at 3:42 AM

Sarah Gurev

@sarahgurev.bsky.social

📉Despite the hype, protein language models trained across the “protein universe” are outperformed by even the simplest, site-independent alignment-based model.
5/12

August 17, 2025 at 3:42 AM

Sarah Gurev

@sarahgurev.bsky.social

💭Imagine: It’s Day 0 of an outbreak and there’s little experiment data. Computational mutational effect predictions could provide valuable information…if we could trust them. Can we?

EVEREST doesn’t just assess performance. It also quantifies reliability for new viruses.
4/12

August 17, 2025 at 3:42 AM

Sarah Gurev

@sarahgurev.bsky.social

🚀To find out, we built EVEREST: Evolutionary Variant Effect prediction with Reliability ESTimation.

We benchmark models across 45 viral deep mutational scanning datasets spanning >340,000 mutations.
3/12

August 17, 2025 at 3:42 AM

Sarah Gurev

@sarahgurev.bsky.social

🦠 Protein language models (PLMs) have shown impressive performance in predicting mutation effects. But... viruses are a different beast.

They evolve fast, cross species, and are under pressure from host immunity. Do PLMs still work here?
2/12

August 17, 2025 at 3:42 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news