Aya Abdelsalam Ismail
asalamismail.bsky.social
Aya Abdelsalam Ismail
@asalamismail.bsky.social
Research Scientist @prescientdesign @Genentech Former PhD @umdcs

ayaismail.com
[8/n] More details in the paper: arxiv.org/abs/2411.06090 and our blog post ncfrey.substack.com/p/building-t....

You can also find the code on GitHub: github.com/prescient-de... and model weights on 🤗.
Concept Bottleneck Language Models For protein design
We introduce Concept Bottleneck Protein Language Models (CB-pLM), a generative masked language model with a layer where each neuron corresponds to an interpretable concept. Our architecture offers thr...
arxiv.org
December 12, 2024 at 10:50 PM
[7/n] Our architecture allows us to know what concepts the model learned and which concepts the model uses during inference by inspecting the weights of the final linear layer; this offers a way to debug and asses the model's quality.
December 12, 2024 at 10:50 PM
[6/n] Interpretability: The concept bottleneck can be used to understand which concept the model uses to predict a certain amino acid. Reliably controlling model behavior: The concepts can be used as knobs to control the model's output.
December 12, 2024 at 10:50 PM
[5/n] We train a mask language model with up to 3 Billion parameters with a layer that directly encodes biophysical and biochemical concepts that biologists care about. These models match the performance of unconstrained masked language model.
December 12, 2024 at 10:50 PM
December 12, 2024 at 10:50 PM
[3/n] In our concept bottleneck protein language model paper, we show that we can train the model with billions of parameters, with interpretability constraints, without performance degradation.
December 12, 2024 at 10:50 PM
[2/n] But the thing is, more often than not, we know beforehand what we want/expect our model to learn, especially in very well-studied domains like Biology. So, instead of playing the guessing game, we trained a model that explicitly learns different concepts that biologists care about.
December 12, 2024 at 10:50 PM