Andrew White 🐦‍⬛
banner
andrew.diffuse.one
Andrew White 🐦‍⬛
@andrew.diffuse.one
Head of Sci/cofounder at futurehouse.org. Prof of chem eng at UofR (on sabbatical). Automating science with AI and robots in biology. Corvid enthusiast
So we probably won't be getting a direct simulation of a whole virtual cell at meaningful timescales any time soon. Oh, and it would require 20x current earth power generation. 3/3

Read the analysis/blog post here: diffuse.one/p/d1-009
diffuse.one
andrew white's blog.
diffuse.one
September 26, 2025 at 3:19 PM
It sounds insane, but remember there are 10^14 atoms in a human cell and 10^20 femtoseconds in a day. And across multiple simulation engines, it requires 10^4 FLOPs per atom x femtosecond 2/3
September 26, 2025 at 3:19 PM
yea, those are the model thoughts. It has a lot of mistakes in its thoughts. But you've got a very good eye! We'll make sure the final paper has a pristine example of its thoughts.
September 19, 2025 at 10:50 PM
Very good point - I can re-run without that phrase.
September 16, 2025 at 5:21 PM
If you don't put phrase in quotes, it's an or. So it was

"α" equation

which is equivalent to "α" OR equation
September 16, 2025 at 5:20 PM
You can also look at it over time. Here's relatively popularity of different animal models in research over time.

Anyway, found this to be interesting. More details about it here: diffuse.one/p/d2-003 3/3
September 14, 2025 at 4:52 PM
Here's one measuring the frequency of sample sizes. Like how often people use 8 samples vs 12 samples for reporting research results. N=2 is apparently the most popular 2/3
September 14, 2025 at 4:52 PM
read it here: diffuse.one/p/d2-002
diffuse.one
andrew white's blog.
diffuse.one
August 15, 2025 at 6:10 PM
We make evals at FutureHouse. It’s hard and it sucks. It’s also now the bottleneck, as we scratch the boundary of human ability. HLE was a huge effort and made many good questions and we hope this analysis stimulates review of the other HLE categories and improvements 7/7
July 23, 2025 at 4:29 PM
We have written up our analysis: www.futurehouse.org/research-ann...
And made a gold subset on @huggingface that passed our review: huggingface.co/datasets/fut... 6/7
futurehouse/hle-gold-bio-chem · Datasets at Hugging Face
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
huggingface.co
July 23, 2025 at 4:29 PM
We reviewed 150 of the questions in the chem and bio and found about 30% have peer-reviewed papers contradicting their ground-truth answers. Issues include confusion of species with orders, misreading of FDA guidelines, etc. All our notes are public. 5/7
July 23, 2025 at 4:29 PM
The HLE rubric wanted questions to have “objectively correct, univocal” ground-truth answers. You can find multiple peer-reviewed papers that contradict the statement "Oganesson was the rarest noble gas in 2002 as a percentage of terrestrial matter" 4/7
July 23, 2025 at 4:29 PM
It’s a clever question. But it’s not really about frontier science. Multiple papers have shown that Oganesson is not a gas (it’s predicted to be semiconducting solid), it’s not noble (it’s reactive), and it isn’t included in any "terrestrial matter" tables of noble gases. 3/7
July 23, 2025 at 4:29 PM
The design process of HLE required the questions to be unanswerable by contemporary LLMs. That lead to many gotcha style questions like the one below. It’s a trick question – in 2002, a few atoms of a group 18 element Oganesson were made for a few milliseconds. 2/7
July 23, 2025 at 4:29 PM
I just noticed it has sound lol. It's amazing
July 12, 2025 at 4:17 AM
It may take a bit to extract the function, but here it is: github.com/Future-House...
ether0/src/ether0/rewards.py at c8cc676354e926b50ad206a606e04489bc9c95e3 · Future-House/ether0
A scientific reasoning model, dataset, and reward functions for chemistry. - Future-House/ether0
github.com
June 22, 2025 at 7:12 PM
Although the discovery here is exciting, we are not claiming that we have cured dry AMD. Fully validating this hypothesis as a treatment for dry AMD will take human trials, which will take much longer.

Blog: www.futurehouse.org/research-ann...
Paper: arxiv.org/abs/2505.13400
Demonstrating end-to-end scientific discovery with Robin: a multi-agent system | FutureHouse
www.futurehouse.org
May 20, 2025 at 3:35 PM