Leif Sieben
leif7ieben.bsky.social
Leif Sieben
@leif7ieben.bsky.social
Master student @ ETH Zurich
Visiting Student @ MIT and Broad Institute

Working on machine learning for drug discovery and bringing all of chemistry into the age of big data and AI
You learn a lot about the underlying system design of your apps when you run them in a low data environment.
October 26, 2025 at 10:00 PM
Reposted by Leif Sieben
A fundamental lesson of modern AI is that scale is essential: training bigger models on bigger datasets unlocks new capabilities. A fundamental lesson of AI engineering is that scaling up isn't trivial: it is not just a matter of spending more money and resources.
September 22, 2025 at 12:15 PM
Strong Platonic Representation Hypothesis: the universal latent structure of text representations not only exists, but can be learned and, furthermore, harnessed to translate representations from one space to another without any paired data or encoders.
September 7, 2025 at 7:35 AM
Got recommend this substack from Leash bio by a friend.

I think this is a masterclass in how to correctly split the data if there ever was one.

Respect your chemistry folks!

open.substack.com/pub/leashbio...
Data contamination is all random forest needs
Here's why we believe our Hermes prediction results are real
open.substack.com
August 3, 2025 at 6:26 PM
Reposted by Leif Sieben
Not sure who came up with "Manhattan Plot", but in 2014 I coined the alternative term "Nijmegen Plot" (inspired by the Dutch town where I live) to describe underwhelming results from our earliest genome-wide association scans of language/reading traits.
July 28, 2025 at 4:42 PM
Reposted by Leif Sieben
Love these maps of "street-text sightings" in the Pudding's latest piece
pudding.cool/2025/07/stre...
July 28, 2025 at 2:22 PM
Reposted by Leif Sieben
Great blog post on rotary position embeddings (RoPE) in more than one dimension, with interactive visualisations, a bunch of experimental results, and code!
On N-dimensional Rotary Positional Embeddings
An exploration of N-dimensional rotary positional embeddings (RoPE) for vision transformers.
jerryxio.ng
July 28, 2025 at 2:51 PM
Reposted by Leif Sieben
Can an AI model predict perfectly and still have a terrible world model?

What would that even mean?

Our new ICML paper (poster tomorrow!) formalizes these questions.

One result tells the story: A transformer trained on 10M solar systems nails planetary orbits. But it botches gravitational laws 🧵
July 14, 2025 at 1:50 PM
Reposted by Leif Sieben
Today's #RDKit blog post is a heartfelt plea for clearer communication.
greglandrum.github.io/rdkit-blog/p...
Please stop saying “The Tanimoto similarity is” – RDKit blog
A simple tip to explain what you actually did
greglandrum.github.io
July 17, 2025 at 11:22 AM
Reposted by Leif Sieben
There is a new startup from China called Moonshot.

The original “moonshot” was the Apollo Program.

An AI based moonshot could be referred to as an “AI pollo” program.

“ai pollo” in Italian means something like “to the chicken”.
July 13, 2025 at 2:47 PM
I was recently on a flight with free Wi-Fi for texting but nothing else.

Jokes on them: I can use Llama through WhatsApp now …
June 26, 2025 at 8:31 AM
Reposted by Leif Sieben
The new #RDKit blog post, inspired by a question from @valencekjell.com, looks at the impact of molecular size on similarity thresholds.
greglandrum.github.io/rdkit-blog/p...
The impact of molecular size on similarity. – RDKit blog
An exploration of how molecular size influences fingerprint similarity.
greglandrum.github.io
June 20, 2025 at 4:24 AM
Reposted by Leif Sieben
Yay for @pschwllr.bsky.social and @mlederbauer.bsky.social (and all your co-authors who aren't on BlueSky yet) 🥳

This #dataset is a prime example of #GoodData, and it ties nicely with what @clarakirkvold.bsky.social and @grynova.bsky.social were talking about a few weeks ago in their #JournalClub
🚨 Dataset article alert! 🚨

Wellawatte, @mlederbauer.bsky.social, @pschwllr.bsky.social and coauthors introduce a new open-source #dataset with >1,000 entries specifically designed for #LLMs applications in #chemistry. #MachineLearningScienceandTechnology #ChemSky 🧪

Article here: bit.ly/4kHdY6x
June 4, 2025 at 3:43 PM
Reposted by Leif Sieben
I've got a joke about Osysseus. I got lost on the way to the punchline...
I’ve got a joke about Theseus, but I’ve reworked it so often I’m not sure if it’s the same.
I’ve got a joke about Polyphemus - it’s a blinder.
June 13, 2025 at 8:16 PM
Reposted by Leif Sieben
June 13, 2025 at 1:59 AM
Reposted by Leif Sieben
change my mind:

bruot RIS to Bibtex converter is the best website ever built.

www.bruot.org/ris2bib/
Online RIS to BibTeX converter
The simple RIS (EndNote) to bib (BibTeX) online conversion app.
www.bruot.org
June 5, 2025 at 12:30 AM
If anybody out there working on antimicrobial resistance (AMR) and needs some motivation on this gloomy New England Monday.
June 9, 2025 at 2:35 PM
I think the ranking of things which are hard to predict goes:

1. The stock market.
2. LaTeX figure placement.
3. The meaning of life.
June 7, 2025 at 1:00 PM
Reposted by Leif Sieben
June 6, 2025 at 11:25 AM
Reposted by Leif Sieben
Cheminformatics family businesses be like
June 6, 2025 at 8:23 PM
One of the surprising things about working in a microbiology lab is that you become more worried about washing your hands before using the restroom rather than after.
June 6, 2025 at 4:09 PM
Reposted by Leif Sieben
I think the thing I'm most excited to see over the next ~10 years of #dataviz is web-based content that interweaves long-form text and modular interactives.

Not as heavy as scrollytelling and not as aimless as a dashboard, but something in between.

This is what I was going for with the QR project!
June 4, 2025 at 2:46 PM
change my mind:

bruot RIS to Bibtex converter is the best website ever built.

www.bruot.org/ris2bib/
Online RIS to BibTeX converter
The simple RIS (EndNote) to bib (BibTeX) online conversion app.
www.bruot.org
June 5, 2025 at 12:30 AM
You know volatility is going crazy when sitting down to write a PAC proof about the sampling efficiency of an active-learning algorithm feels like a therapy session.

At least math hasn't changed over the past 12 months ...
May 28, 2025 at 8:40 PM
Reposted by Leif Sieben
Not me accidentally typing `squeue` into the Facebook chat.
May 23, 2025 at 1:00 AM