Elana Simon
elanasimon.bsky.social
Elana Simon
@elanasimon.bsky.social
🛠️ Want to analyze your own protein models? (8/9)
- Code: github.com/ElanaPearl/interPLM
- Full framework for PLM interpretation
- Methods for training, analysis, and visualization
GitHub - ElanaPearl/InterPLM: Discovering Interpretable Features in Protein Language Models via Sparse Autoencoders
Discovering Interpretable Features in Protein Language Models via Sparse Autoencoders - ElanaPearl/InterPLM
github.com
November 19, 2024 at 7:36 PM
✨ Explore the features yourself! (7/9)
- Interactive visualization: interplm.ai
- Explore features from every layer of ESM-2-8M
- See how proteins activate different features
- Examine structural patterns
November 19, 2024 at 7:36 PM
🧪 We can also steer model predictions by adjusting feature activations, demonstrating how understanding these representations could help guide protein design (6/9)
November 19, 2024 at 7:36 PM
🎯 Beyond understanding PLMs, these features have practical applications (5/9):
Finding missing annotations in protein databases
Identifying potentially new biological motifs
Suggesting locations of binding sites and functional regions
November 19, 2024 at 7:36 PM
🤖 We showed LLMs can generate meaningful descriptions of many features - and these descriptions can be validated by successfully predicted which proteins would activate each feature! (4/9)
November 19, 2024 at 7:36 PM
📊We identified up to 2,548 interpretable features per layer that match known biological concept annotations - compared to just 46 from individual neurons.

This suggests PLMs store biological information in superposition - multiple concepts sharing the same neurons! (3/9)
November 19, 2024 at 7:36 PM
🔍 Using InterPLM, we identified features in ESM-2 that detect various biological properties, from local motifs to complex structural patterns (2/9)
- Catalytic sites
- Zinc fingers
- Targeting sequences
- Post-translational modifications
- Structural elements and many more!
November 19, 2024 at 7:36 PM