Frederick "Erick" Matsen
@matsen.bsky.social
I ♥ evolution, immunology, math, & computers. Professor at Fred Hutch & Investigator at HHMI. http://matsen.fredhutch.org/
... and second is to have a map from the figures to where they are made in the associated "experiments" code repository (github.com/matsengrp/dn...):
September 25, 2025 at 6:03 PM
... and second is to have a map from the figures to where they are made in the associated "experiments" code repository (github.com/matsengrp/dn...):
I forgot to post two things I liked doing in this paper that I hope catch on. First is to have links in the methods section to the model fitting code (in a tagged version github.com/matsengrp/ne... as the code continues to evolve):
September 25, 2025 at 6:03 PM
I forgot to post two things I liked doing in this paper that I hope catch on. First is to have links in the methods section to the model fitting code (in a tagged version github.com/matsengrp/ne... as the code continues to evolve):
Oh, and here is a picture of a cyborg-Darwin (cooked up by Gemini), after he realized how useful transformers are. For some reason MBE didn't want it as a cover image!
September 24, 2025 at 10:24 PM
Oh, and here is a picture of a cyborg-Darwin (cooked up by Gemini), after he realized how useful transformers are. For some reason MBE didn't want it as a cover image!
Many thanks to Kevin Sung and Mackenzie Johnson for leading the all-important task of data prep, Will Dumm for code and methods contributions, David Rich for structural work, and Tyler Starr, Yun Song, Phil Bradley, Julia Fukuyama, and Hugh Haddox for conceptual help.
September 24, 2025 at 10:24 PM
Many thanks to Kevin Sung and Mackenzie Johnson for leading the all-important task of data prep, Will Dumm for code and methods contributions, David Rich for structural work, and Tyler Starr, Yun Song, Phil Bradley, Julia Fukuyama, and Hugh Haddox for conceptual help.
We have positioned our group in this niche: we want to answer biological questions using ML-supercharged versions of the methods that scientists have been using for decades to derive insight.
More in this theme to come!
More in this theme to come!
September 24, 2025 at 10:24 PM
We have positioned our group in this niche: we want to answer biological questions using ML-supercharged versions of the methods that scientists have been using for decades to derive insight.
More in this theme to come!
More in this theme to come!
Stepping back, I think that transformers and their ilk have so much to offer fields like molecular evolution. Now we can parameterize statistical models using a sequence as an input!
September 24, 2025 at 10:24 PM
Stepping back, I think that transformers and their ilk have so much to offer fields like molecular evolution. Now we can parameterize statistical models using a sequence as an input!
If you want to give it a try, we have made it available using a simple `pretrained` interface. Here is a demo notebook. github.com/matsengrp/n...
netam/notebooks/dnsm_demo.ipynb at main · matsengrp/netam
Neural networks to model BCR affinity maturation. Contribute to matsengrp/netam development by creating an account on GitHub.
github.com
September 24, 2025 at 10:24 PM
If you want to give it a try, we have made it available using a simple `pretrained` interface. Here is a demo notebook. github.com/matsengrp/n...
And because natural selection is predicted for individual sequences, we can also investigate changes in selection strength as a sequence evolves down a tree:
September 24, 2025 at 10:24 PM
And because natural selection is predicted for individual sequences, we can also investigate changes in selection strength as a sequence evolves down a tree:
Because this model isn't constrained to work with a fixed-width multiple sequence alignment we can do things like look at per-site selection factors on sequences with varying CDR3 length:
September 24, 2025 at 10:24 PM
Because this model isn't constrained to work with a fixed-width multiple sequence alignment we can do things like look at per-site selection factors on sequences with varying CDR3 length:
If a selection factor at a given site for a given sequence is
• > 1 that is diversifying selection
• = 1 that is neutral selection
• < 1 that is purifying selection.
• > 1 that is diversifying selection
• = 1 that is neutral selection
• < 1 that is purifying selection.
September 24, 2025 at 10:24 PM
If a selection factor at a given site for a given sequence is
• > 1 that is diversifying selection
• = 1 that is neutral selection
• < 1 that is purifying selection.
• > 1 that is diversifying selection
• = 1 that is neutral selection
• < 1 that is purifying selection.
The model is above. In many ways it is like a classical model of mutation and selection, but the mutation model is a convolutional model and the selection model is a transformer-encoder mapping from AA sequences to a vector of selection factors of the same length as the sequence.
September 24, 2025 at 10:24 PM
The model is above. In many ways it is like a classical model of mutation and selection, but the mutation model is a convolutional model and the selection model is a transformer-encoder mapping from AA sequences to a vector of selection factors of the same length as the sequence.
Hats off to first author Kevin Sung www.linkedin.com/in/kevinsun... and the rest of the team 🙏 !
September 18, 2025 at 10:46 PM
Hats off to first author Kevin Sung www.linkedin.com/in/kevinsun... and the rest of the team 🙏 !
I was very proud to get "The authors are to be commended for their efforts to communicate with the developers of previous models and use the strongest possible versions of those in their current evaluation" in peer reviews:
elifesciences.org/articles/10...
elifesciences.org/articles/10...
Peer review in Thrifty wide-context models of B cell receptor somatic hypermutation
Convolutional embedding models efficiently capture wide sequence context in antibody somatic hypermutation, avoiding exponential k-mer parameter scaling and eliminating the need for per-site modeling.
elifesciences.org
September 18, 2025 at 10:46 PM
I was very proud to get "The authors are to be commended for their efforts to communicate with the developers of previous models and use the strongest possible versions of those in their current evaluation" in peer reviews:
elifesciences.org/articles/10...
elifesciences.org/articles/10...
Pretrained models are available at github.com/matsengrp/n..., and the computational experiments are at github.com/matsengrp/t....
GitHub - matsengrp/thrifty-experiments-1
Contribute to matsengrp/thrifty-experiments-1 development by creating an account on GitHub.
github.com
September 18, 2025 at 10:46 PM
Pretrained models are available at github.com/matsengrp/n..., and the computational experiments are at github.com/matsengrp/t....
It's possible that more complex models not more significantly dominating comes from a lack of suitable training data, namely neutrally evolving out-of-frame sequences. We tried to augment the training data, with no luck.
September 18, 2025 at 10:46 PM
It's possible that more complex models not more significantly dominating comes from a lack of suitable training data, namely neutrally evolving out-of-frame sequences. We tried to augment the training data, with no luck.
The resulting models are better than 5-mer models, but only modestly so. We made many efforts to include a per-site rate but concluded that the effects of such a rate were weak enough that including them did not improve model performance.
September 18, 2025 at 10:46 PM
The resulting models are better than 5-mer models, but only modestly so. We made many efforts to include a per-site rate but concluded that the effects of such a rate were weak enough that including them did not improve model performance.
Solution: first embed 3-mers and then the number of parameters goes up only linearly with the context width.
September 18, 2025 at 10:46 PM
Solution: first embed 3-mers and then the number of parameters goes up only linearly with the context width.