ScienceChihuahua
sciencechihuahua.bsky.social
ScienceChihuahua
@sciencechihuahua.bsky.social

Interests: bioinformatics, cheminformatics, machine learning, Bayesian stats, science fiction and chihuahuas.
Reposted by ScienceChihuahua
🔥 Benchmark Alert! MotifBench sets a new standard for evaluating protein design methods in motif scaffolding.
Why does this matter? Reproducibility & fair comparison have been lacking—until now.
Paper: arxiv.org/abs/2502.12479 | Repo: github.com/blt2114/Moti...
A thread ⬇️
February 19, 2025 at 8:50 PM
Reposted by ScienceChihuahua
move over ligand RMSD < 2 Å 😤 ConfBench is on the scene!

if you're interested in the evaluation of conformational accuracy of structure prediction methods, take a look at our first stab at a systematic conformational benchmark in the NP3 technical report below! 🧵

www.iambic.ai/post/np3-tec...
December 17, 2024 at 4:37 AM
Reposted by ScienceChihuahua
I'm thrilled to announce a new preprint describing collaborative work with Ajay Jain and Ann Cleves Jain, "Deep-Learning Based Docking Methods: Fair Comparisons to Conventional Docking Workflows".

arxiv.org/abs/2412.02889
Deep-Learning Based Docking Methods: Fair Comparisons to Conventional Docking Workflows
The diffusion learning method, DiffDock, for docking small-molecule ligands into protein binding sites was recently introduced. Results included comparisons to more conventional docking approaches, wi...
arxiv.org
December 5, 2024 at 4:21 PM
Reposted by ScienceChihuahua
A common question nowadays: Which is better, diffusion or flow matching? 🤔

Our answer: They’re two sides of the same coin. We wrote a blog post to show how diffusion models and Gaussian flow matching are equivalent. That’s great: It means you can use them interchangeably.
December 2, 2024 at 6:45 PM
Reposted by ScienceChihuahua
Large protein language models can learn complex epistatic interactions, but how much does that help with predicting variant effects? In this NeurIPS article, we show that classical independent-sites phylogenetic models can outperform pLMs on this task.
1/7
openreview.net/forum?id=H7m...
Ultrafast classical phylogenetic method beats large protein...
Amino acid substitution rate matrices are fundamental to statistical phylogenetics and evolutionary biology. Estimating them typically requires reconstructed trees for massive amounts of aligned...
openreview.net
November 16, 2024 at 8:42 PM
Interesting, although worth noting the MoleculeNet benchmarks have some issues (e.g. practicalcheminformatics.blogspot.com/2023/08/we-n... ). Reminds me of this paper (www.biorxiv.org/content/10.1...), in which billion-compound chemical LLMs failed to improve on ECFP fingerprints
Fresh off the presses:
In "Learning on compressed molecular representations" Jan Weinreich and I looked into whether GZIP performed better than Neural Networks in chemical machine learning tasks. Yes, you've read that right.

TL;DR: Yes, GZIP can perform better than baseline GNNs and MLPs. It can ..
Learning on compressed molecular representations
Last year, a preprint gained notoriety, proposing that a k-nearest neighbour classifier is able to outperform large-language models using compressed text as input and normalised compression distance (...
pubs.rsc.org
November 21, 2024 at 5:21 PM
Reposted by ScienceChihuahua
@ my san diego comp chemists!

december SAGIM event was announced!

looking forward to connecting with fellow comp chemists and modelers over craft beers at new english brewery this december 5th - drop by between 4-7pm

www.meetup.com/southern-cal...
December SAGIM Happy Hour, Thu, Dec 5, 2024, 4:00 PM | Meetup
Join us for a SAGIM happy hour at New English Brewery. Connect and network with other San Diego based computational chemists, modelers and informaticians working in biotech
www.meetup.com
November 20, 2024 at 8:43 PM
Reposted by ScienceChihuahua
Looks like Bryan Dickinson isn’t here yet so I’m cross posting his challenge:

“if you or someone you know thinks they can actually predict PPIs, prove it. Here is a link to our blinded protein sequence data”
dickinsonlab.uchicago.edu/ppi-challenge
PPI Challenge | dickinson-group
dickinsonlab.uchicago.edu
November 14, 2024 at 12:01 AM
I like nuclear power -- low-carbon, long-term source of power that IMO has been held back by prejudice & made uneconomical by red tape. It's a pleasant surprise to read that Google / Amazon are (apparently) funding dev, but regulatory hurdles are likely formidable...

arstechnica.com/science/2024...
Amazon invests in nuclear power
Amazon is investing in small modular nuclear reactors. while they’re radiation safe, they’re a risky investment.
arstechnica.com
October 17, 2024 at 2:22 PM
Reposted by ScienceChihuahua
A gigantic case of scientific fraud, spanning over 25 years and involving dozens of publications, roils the National Institute on Aging. Their head of the neuroscience division is the common denominator, and he's been removed:
Fraud, So Much Fraud
www.science.org
September 27, 2024 at 4:35 PM
A summary of the NIPS2024 BELKA ML for chemistry Kaggle competition, with > 100 million datapoints and > 2,000 teams. On test compounds very different from training data, none of the submissions beat random chance. ML for chem still has a lot of room to improve: leashbio.substack.com/p/belka-resu...
BELKA results suggest computers can memorize, but not create, drugs
We bet machines will create novel druglike material in the future, but probably not now
leashbio.substack.com
August 26, 2024 at 4:12 AM
Reposted by ScienceChihuahua
I’m beyond thrilled to share that our work on using deep learning to compute excited states of molecules is out today in Science Magazine! This is the first time that deep learning has accurately solved some of the hardest problems in quantum physics. www.science.org/doi/full/10....
www.science.org
August 22, 2024 at 6:20 PM
Reposted by ScienceChihuahua
What happened when MIT stopped paying Elsevier? Not much except they are saving a lot of money. "MIT is interested in collaborating with other libraries to reinvest these funds in community-controlled open publishing initiatives..." sparcopen.org/our-work/big...
August 16, 2024 at 8:49 AM
Nice post on how much noise gen AI for chemistry can produce / how much scrutiny its output requires, and how popular press articles that claim "AI is designing drugs" are exaggerating (a lot). Unfortunately the hype just won't stop...

practicalcheminformatics.blogspot.com/2024/05/gene...
Generative Molecular Design Isn't As Easy As People Make It Look
I was taken aback by a recent CNBC article entitled “ Generative AI will be designing new drugs all on its own in the near future ”.  I shou...
practicalcheminformatics.blogspot.com
May 24, 2024 at 3:32 AM
Reposted by ScienceChihuahua
Elementary Physics Paths xkcd.com/2933
May 16, 2024 at 2:13 PM
This paper achieves results substantially better than RFDiffusion for de novo antibody design (7 mAbs with Kd < 25nM, good stability / Tm), but by using docking & force fields (no deep learning). Like the RFDiffusion paper, however, success rate is low.

www.biorxiv.org/content/10.1...
www.biorxiv.org
May 3, 2024 at 5:04 PM
Reposted by ScienceChihuahua
Foldseek-Multimer is a protein complex aligner that is up to 10,000x times faster than SOTA methods without sacrificing quality, enabling the comparison of billions of complex pairs per day.
🧬🧶
📄 www.biorxiv.org/content/10.1...
💾 github.com/steineggerla...
🕸️ search.foldseek.com
Rapid and Sensitive Protein Complex Alignment with Foldseek-Multimer
bioRxiv - the preprint server for biology, operated by Cold Spring Harbor Laboratory, a research and educational institution
www.biorxiv.org
April 15, 2024 at 9:14 AM
Reposted by ScienceChihuahua
Required reading by @lpachter.bsky.social. Seurat vs Scanpy processing, and how package versions affect your analysis 👀 www.biorxiv.org/content/10.1...
The impact of package selection and versioning on single-cell RNA-seq analysis
bioRxiv - the preprint server for biology, operated by Cold Spring Harbor Laboratory, a research and educational institution
www.biorxiv.org
April 5, 2024 at 9:21 PM
Kind of amazed at how good Netflix’s Three Body Problem is. It’s rare to find a show that took a book you really liked…and made it even better. I really hope they follow thru and make a season 2!
March 29, 2024 at 2:24 PM
SPICE 2.0 is out, a major update to this dataset for training ML force fields:

github.com/openmm/spice...
Release SPICE 2.0.0 · openmm/spice-dataset
This is a major update that roughly doubles the total amount of data. It particularly focuses on increasing the amount of chemical diversity and improving sampling of nonbonded interactions. It c...
github.com
March 20, 2024 at 12:49 AM
Thought-provoking post by Michael Bronstein. Argues AlphaFold is isolated success hard to replicate in other bio/chem probs and current foundation models won't "solve bio". He promotes a "black-box data" approach (I have mixed feelings about this but it's an interesting argument):

t.co/yE5uFPEof2
The Road to Biology 2.0 Will Pass Through Black-Box Data
Future bio-AI breakthroughs will arise from novel high-throughput low-cost AI-specific “black-box” data modalities.
t.co
March 19, 2024 at 2:36 PM
Love the way this paper demonstrates in detail how misleading t-SNE / UMAP can be. Not a problem if used only for data exploration, but I keep seeing papers where a UMAP is used to support a hypothesis. Read this if you're not already convinced that's a bad idea

journals.plos.org/ploscompbiol...
The specious art of single-cell genomics
Dimensionality reduction is standard practice for filtering noise and identifying relevant features in large-scale data analyses. In biology, single-cell genomics studies typically begin with reductio...
journals.plos.org
February 19, 2024 at 3:56 PM
It’s interesting how much more spam you get from predatory journals after publishing on BioRxiv (a lot) vs ArXiv (none). Haven’t used ChemRxiv but if I had to guess it’s probably more like BioRxiv in this regard
February 14, 2024 at 10:52 AM
Nice paper -- presents more evidence structure prediction models haven't learned physics of protein folding, does some in-depth exploration of what ESM has learned. "Our results caution against assuming pLMs as oracles of protein properties..."

www.biorxiv.org/content/10.1...
February 1, 2024 at 3:55 PM
Reposted by ScienceChihuahua
This essay by Jennifer Listgarten in Nature Biotechnology is well worth reading in the face of all the magical thinking on what chatbots can do for science www.nature.com/articles/s41...
January 25, 2024 at 4:57 PM