Jeffrey Ouyang-Zhang
zhang-ouyang.bsky.social
Jeffrey Ouyang-Zhang
@zhang-ouyang.bsky.social
ML + Bio (prev CV)
http://jozhang97.github.io
ooh also very curious 👀
November 27, 2024 at 2:06 PM
Could you add me to this list?
November 21, 2024 at 3:51 PM
Implementation is extremely simple. If you are using ESM2, you're just one line of code away from upgrading to ISM's enhanced capabilities. (7/7)
🔖 paper www.biorxiv.org/content/10.1...
💻 github: github.com/jozhang97/ISM
🤗 huggingface: huggingface.co/jozhang97/is...
November 13, 2024 at 12:40 AM
To conclude, ISM takes sequence-only input but produces structurally-rich representations. After all, the amino acid sequence is the only genetic information necessary for protein folding. Our structural loss better enables transformers to learn sequence-structure mapping. (6/7)
November 13, 2024 at 12:40 AM
On structural benchmarks, we found that our model's structural representations outperform those from existing sequence models and even match performance with representations from models that take structure and sequence as input. (5/7)
November 13, 2024 at 12:40 AM
ISM's secret sauce is a microenvironment-based autoencoder. The all-atom autoencoder learns to embed the tertiary structure surrounding a residue into a structure token. We distill these per-residue tokens (and MutRank tokens) into ESM2. (4/7)
November 13, 2024 at 12:40 AM
Masked language modeling enables ESM2 to learn rich evolutionary features which capture a view of the structural landscape. However, it often underperforms structure models on downstream tasks.

We fine-tune ESM2 to predict representations from structure models. (3/7)
November 13, 2024 at 12:40 AM
ISM is our latest protein language model which enhances ESM2 with enriched structural representations. (2/7)
November 13, 2024 at 12:40 AM