Neil Thomas
banner
countablyfinite.bsky.social
Neil Thomas
@countablyfinite.bsky.social
Research Scientist; AI + Biology
thomas-a-neil.github.io
Pinned
So proud to see our work on machine learning + enzyme design just published! www.cell.com/cell-systems...

Fun collaboration between Google X, DeepMind, and Triplebar that we hope can be a template for integrating ML and high throughput screening in protein engineering
Engineering highly active nuclease enzymes with machine learning and high-throughput screening
Thomas et al. introduce TeleProt, a framework for guiding protein library design with machine learning, and validate it in an enzyme engineering campaign to optimize the endonuclease NucB. Across 4 ro...
www.cell.com
Reposted by Neil Thomas
This October I’m drawing one molecule a day inspired by proteins @rcsb.bsky.social

Day 1/31
Prompt MUSTACHE
Pdb 2QZI

Let’s start with something fun:
Mr. Potato head’s ‘stache is made of Androgen Receptor that binds testosterone and helps maintain his male phenotype

Next prompt: WEAVE
suggestions?
October 2, 2025 at 3:09 AM
Reposted by Neil Thomas
We are excited to share GPN-Star, a cost-effective, biologically grounded genomic language modeling framework that achieves state-of-the-art performance across a wide range of variant effect prediction tasks relevant to human genetics.
www.biorxiv.org/content/10.1...
(1/n)
September 22, 2025 at 5:29 AM
slides remain the hardest modality
August 22, 2025 at 7:22 AM
Reposted by Neil Thomas
Antibodies are highly diverse, but most possible sequences are unstable or polyreactive. In this work, just published in Cell Syst., we propose a new source of data for modeling constraints from these properties. Our models show clear improvements in predicting Ab dysfunction. (1/n)
t.co/qCZERPUMPF
https://authors.elsevier.com/a/1lbX08YyDfuZWX
t.co
August 15, 2025 at 1:17 PM
Sign up for The Tournament!

🌍 Design a PETase - real-world impact on bioremediation!
🧬 Sponsored DNA synthesis and functional screening - no need for a lab!
🤖 Sponsored ESM inference through @evolutionaryscale.bsky.social Forge - if GPUs are a barrier!
🏆 Winners get published and win up to $15K!
Why PETase for our tournament? In 2024, the world made about 30 million tonnes of PET plastic, most from fossil fuels.

PETase can degrade PET, but isn’t ready for industrial-scale waste. The challenge: design an improved variant that can change that.

Register by Oct 17 alignbio.org/protein-engi...
August 13, 2025 at 9:09 PM
Reposted by Neil Thomas
Why PETase for our tournament? In 2024, the world made about 30 million tonnes of PET plastic, most from fossil fuels.

PETase can degrade PET, but isn’t ready for industrial-scale waste. The challenge: design an improved variant that can change that.

Register by Oct 17 alignbio.org/protein-engi...
August 13, 2025 at 5:38 PM
Reposted by Neil Thomas
A benchmark dataset of 614 experimentally characterized de novo designed monomers from 11 different design studies shows that:
- deep learning structural metrics only weakly predict success
- The score distribution is different for different types of structures

@grocklin.bsky.social
August 8, 2025 at 8:10 PM
Reposted by Neil Thomas
With Tom Lehrer's passing, I suppose this is a moment to share the story of the prank he played on the National Security Agency, and how it went undiscovered for nearly 60 years.
July 27, 2025 at 9:01 PM
Reposted by Neil Thomas
Stats friends... what would your estimator be if you were interested in a similar question as this study that is lighting Bluesky on fire tonight? 1/x
metr.org METR @metr.org · Jul 10
We ran a randomized controlled trial to see how much AI coding tools speed up experienced open-source developers.

The results surprised us: Developers thought they were 20% faster with AI tools, but they were actually 19% slower when they had access to AI than when they didn't.
July 11, 2025 at 12:08 AM
Reposted by Neil Thomas
1/4
🚀 Announcing the 2025 Protein Engineering Tournament.

This year’s challenge: design PETase enzymes, which degrade the type of plastic in bottles. Can AI-guided protein design help solve the climate crisis? Let’s find out! ⬇️

#AIforBiology #ClimateTech #ProteinEngineering #OpenScience
July 8, 2025 at 4:26 PM
Reposted by Neil Thomas
We're sponsoring the use of ESM3 and EMSC to help researchers engineer improved PETase enzymes in the @AlignBio 2025 Protein Engineering Tournament.

Get started using ESMC to predict protein function and ESM3 to generate new enzymes here: github.com/evolutionary...
July 8, 2025 at 6:01 PM
Reposted by Neil Thomas
Today I remembered my first QM parameterization of a small molecule failed miserably (turn volume ON for a full experience)
March 26, 2025 at 9:18 PM
Reposted by Neil Thomas
NIH funding supporting the HMMER and Infernal software projects has been terminated. NIH states that our work, as well as all other federally funded research at Harvard, is of no benefit to the US.
May 22, 2025 at 12:42 PM
Reposted by Neil Thomas
Next Tues (4/29) at **4:30PM** ET, we will have @ginaelnesr.bsky.social @hkws.bsky.social present "Learning millisecond protein dynamics from what is missing in NMR spectra"

Paper: biorxiv.org/content/10.1...

Sign up on our website for zoom links!
Learning millisecond protein dynamics from what is missing in NMR spectra
Many proteins’ biological functions rely on interconversions between multiple conformations occurring at micro-to millisecond (µs-ms) timescales. A lack of standardized, large-scale experimental data ...
biorxiv.org
April 22, 2025 at 9:08 PM
Reposted by Neil Thomas
Thrilled to see my digital art on the cover of Trends Genet. The two binary strings represent reverse-complementary DNA sequences (00=A, 01=C, 10=G, 11=T) and the connecting rectangles represent “embeddings” learned by DNA language models. Pls check out our article as well: doi.org/10.1016/j.ti...
April 7, 2025 at 3:01 PM
Reposted by Neil Thomas
Small proteins can be more complex than they look!

We know proteins fluctuate between different conformations- but by how much? How does it vary from protein to protein? Can highly stable domains have low stability segments? @ajrferrari.bsky.social experimentally tested >5,000 domains to find out!
March 26, 2025 at 4:21 PM
Reposted by Neil Thomas
Gene synthesis is often the most expensive part of protein engineering with generative models.

Happy to have played a small part in this work, where Chase developed a method for precision library construction at scale, with per-gene costs as low as $1.50.

@philromero.bsky.social
March 24, 2025 at 5:24 PM
Reposted by Neil Thomas
🎉Congrats to Chase on her new preprint! She developed OMEGA--a simple method for assembling custom gene panels for as little as $1.50 per gene. Big step forward protein engineering and design!🧬
www.biorxiv.org/content/10.1...
Scalable and cost-efficient custom gene library assembly from oligopools
Advances in metagenomics, deep learning, and generative protein design have enabled broad in silico exploration of sequence space, but experimental characterization is still constrained by the cost an...
www.biorxiv.org
March 24, 2025 at 4:50 PM
So exciting to think what we will be able to do as we pair scaled library assembly techniques like these with ML-designed libraries and high throughput screening!
March 24, 2025 at 5:38 PM
Reposted by Neil Thomas
Protein dynamics was the first research to enchant me >10yrs ago, but I left in PhD bc I couldn't find big experimental data to evaluate models.

Today w @ginaelnesr.bsky.social, I'm thrilled to share the big dynamics data I've been dreaming of, and the mdl we trained w them: Dyna-1.
📝: rb.gy/de5axp
March 20, 2025 at 3:02 PM
Reposted by Neil Thomas
Protein function often depends on protein dynamics. To design proteins that function like natural ones, how do we predict their dynamics?

@hkws.bsky.social and I are thrilled to share the first big, experimental datasets on protein dynamics and our new model: Dyna-1!

🧵
March 20, 2025 at 3:02 PM
So proud to see our work on machine learning + enzyme design just published! www.cell.com/cell-systems...

Fun collaboration between Google X, DeepMind, and Triplebar that we hope can be a template for integrating ML and high throughput screening in protein engineering
Engineering highly active nuclease enzymes with machine learning and high-throughput screening
Thomas et al. introduce TeleProt, a framework for guiding protein library design with machine learning, and validate it in an enzyme engineering campaign to optimize the endonuclease NucB. Across 4 ro...
www.cell.com
March 12, 2025 at 5:18 PM
I'll be in Vancouver for NeurIPS / MLSB from Dec 13-16! If you're interested in meeting up, especially to discuss protein language models, reach out! :)
December 6, 2024 at 10:57 PM
Reposted by Neil Thomas
Introducing ESM Cambrian, a new family of protein language models, focused on creating representations of the underlying biology of proteins.
December 4, 2024 at 5:45 PM
Reposted by Neil Thomas
Large protein language models can learn complex epistatic interactions, but how much does that help with predicting variant effects? In this NeurIPS article, we show that classical independent-sites phylogenetic models can outperform pLMs on this task.
1/7
openreview.net/forum?id=H7m...
Ultrafast classical phylogenetic method beats large protein...
Amino acid substitution rate matrices are fundamental to statistical phylogenetics and evolutionary biology. Estimating them typically requires reconstructed trees for massive amounts of aligned...
openreview.net
November 16, 2024 at 8:42 PM