Etowah Adams
etowah0.bsky.social
Etowah Adams
@etowah0.bsky.social
enjoying and bemoaning biology. phd student
@columbia prev. @harvardmed @ginkgo @yale
Not all predictive features are easily interpretable. For instance, a predictor for membrane localization, latent L28/3154, predominantly activates on poly-alanine sequences, whose functional relevance remains unclear.
February 10, 2025 at 4:12 PM
When trained to predict thermostability, the linear models assign their most positive weights to features that correlate with hydrophobic amino acids and their most negative weights to a feature activating on glutamate, an amino acid linked to instability.
February 10, 2025 at 4:12 PM
For the extracellular class, we uncover features that activate on signal peptides, which trigger the cellular mechanisms that transport proteins outside the cell.
February 10, 2025 at 4:12 PM
For example, we trained a linear classifier on subcellular localization labels (nucleus, cytoplasm, mitochondrion…). Excitingly, the most highly weighted features for the nucleus class recognize nuclear localization signals (NLS)
February 10, 2025 at 4:12 PM
We categorize SAE latents by family specificity and activation pattern by layer. We find more family specific features in middle layers, and more single token activating (point) features in layer layers. We also study how SAE hyperparameters change what features are uncovered.
February 10, 2025 at 4:12 PM
We find many features appear highly specific to certain protein families, suggesting pLMs contain an internal notion of protein families which mirrors known homology-based families. Other features correspond to broader, family-independent coevolutionary signals.
February 10, 2025 at 4:12 PM
Using sparse autoencoders (SAEs), we find many interpretable features in a protein language model (pLM), ESM-2, ranging from secondary structure, to conserved domains, to context specific properties. Explore them at interprot.com!
February 10, 2025 at 4:12 PM
Can we learn protein biology from a language model?

In new work led by @liambai.bsky.social and me, we explore how sparse autoencoders can help us understand biology—going from mechanistic interpretability to mechanistic biology.
February 10, 2025 at 4:12 PM