Lightnews — Scholar-powered news

Elana Simon

@elanasimon.bsky.social

For more information, check the preprint! (9/9)
www.biorxiv.org/content/10.1101/2024.11.14.623630v1

InterPLM: Discovering Interpretable Features in Protein Language Models via Sparse Autoencoders

Protein language models (PLMs) have demonstrated remarkable success in protein modeling and design, yet their internal mechanisms for predicting structure and function remain poorly understood. Here w...

www.biorxiv.org

November 19, 2024 at 7:36 PM

Elana Simon

@elanasimon.bsky.social

🛠️ Want to analyze your own protein models? (8/9)
- Code: github.com/ElanaPearl/interPLM
- Full framework for PLM interpretation
- Methods for training, analysis, and visualization

GitHub - ElanaPearl/InterPLM: Discovering Interpretable Features in Protein Language Models via Sparse Autoencoders

Discovering Interpretable Features in Protein Language Models via Sparse Autoencoders - ElanaPearl/InterPLM

github.com

November 19, 2024 at 7:36 PM

Elana Simon

@elanasimon.bsky.social

✨ Explore the features yourself! (7/9)
- Interactive visualization: interplm.ai
- Explore features from every layer of ESM-2-8M
- See how proteins activate different features
- Examine structural patterns

November 19, 2024 at 7:36 PM

Elana Simon

@elanasimon.bsky.social

🧪 We can also steer model predictions by adjusting feature activations, demonstrating how understanding these representations could help guide protein design (6/9)

November 19, 2024 at 7:36 PM

Elana Simon

@elanasimon.bsky.social

🎯 Beyond understanding PLMs, these features have practical applications (5/9):
Finding missing annotations in protein databases
Identifying potentially new biological motifs
Suggesting locations of binding sites and functional regions

November 19, 2024 at 7:36 PM

Elana Simon

@elanasimon.bsky.social

🤖 We showed LLMs can generate meaningful descriptions of many features - and these descriptions can be validated by successfully predicted which proteins would activate each feature! (4/9)

November 19, 2024 at 7:36 PM

Elana Simon

@elanasimon.bsky.social

📊We identified up to 2,548 interpretable features per layer that match known biological concept annotations - compared to just 46 from individual neurons.

This suggests PLMs store biological information in superposition - multiple concepts sharing the same neurons! (3/9)

November 19, 2024 at 7:36 PM

Elana Simon

@elanasimon.bsky.social

🔍 Using InterPLM, we identified features in ESM-2 that detect various biological properties, from local motifs to complex structural patterns (2/9)
- Catalytic sites
- Zinc fingers
- Targeting sequences
- Post-translational modifications
- Structural elements and many more!

November 19, 2024 at 7:36 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news