Lightnews — Scholar-powered news

Andrew White 🐦‍⬛

@andrew.diffuse.one

Head of Sci/cofounder at futurehouse.org. Prof of chem eng at UofR (on sabbatical). Automating science with AI and robots in biology. Corvid enthusiast

Posts Replies Media Videos

Andrew White 🐦‍⬛

@andrew.diffuse.one

So we probably won't be getting a direct simulation of a whole virtual cell at meaningful timescales any time soon. Oh, and it would require 20x current earth power generation. 3/3

Read the analysis/blog post here: diffuse.one/p/d1-009

diffuse.one

andrew white's blog.

diffuse.one

September 26, 2025 at 3:19 PM

Andrew White 🐦‍⬛

@andrew.diffuse.one

It sounds insane, but remember there are 10^14 atoms in a human cell and 10^20 femtoseconds in a day. And across multiple simulation engines, it requires 10^4 FLOPs per atom x femtosecond 2/3

September 26, 2025 at 3:19 PM

Andrew White 🐦‍⬛

@andrew.diffuse.one

yea, those are the model thoughts. It has a lot of mistakes in its thoughts. But you've got a very good eye! We'll make sure the final paper has a pristine example of its thoughts.

September 19, 2025 at 10:50 PM

Andrew White 🐦‍⬛

@andrew.diffuse.one

Very good point - I can re-run without that phrase.

September 16, 2025 at 5:21 PM

Andrew White 🐦‍⬛

@andrew.diffuse.one

If you don't put phrase in quotes, it's an or. So it was

"α" equation

which is equivalent to "α" OR equation

September 16, 2025 at 5:20 PM

Andrew White 🐦‍⬛

@andrew.diffuse.one

You can also look at it over time. Here's relatively popularity of different animal models in research over time.

Anyway, found this to be interesting. More details about it here: diffuse.one/p/d2-003 3/3

September 14, 2025 at 4:52 PM

Andrew White 🐦‍⬛

@andrew.diffuse.one

Here's one measuring the frequency of sample sizes. Like how often people use 8 samples vs 12 samples for reporting research results. N=2 is apparently the most popular 2/3

September 14, 2025 at 4:52 PM

Andrew White 🐦‍⬛

@andrew.diffuse.one

read it here: diffuse.one/p/d2-002

diffuse.one

andrew white's blog.

diffuse.one

August 15, 2025 at 6:10 PM

Andrew White 🐦‍⬛

@andrew.diffuse.one

We make evals at FutureHouse. It’s hard and it sucks. It’s also now the bottleneck, as we scratch the boundary of human ability. HLE was a huge effort and made many good questions and we hope this analysis stimulates review of the other HLE categories and improvements 7/7

July 23, 2025 at 4:29 PM

Andrew White 🐦‍⬛

@andrew.diffuse.one

We have written up our analysis: www.futurehouse.org/research-ann...
And made a gold subset on @huggingface that passed our review: huggingface.co/datasets/fut... 6/7

futurehouse/hle-gold-bio-chem · Datasets at Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

July 23, 2025 at 4:29 PM

Andrew White 🐦‍⬛

@andrew.diffuse.one

We reviewed 150 of the questions in the chem and bio and found about 30% have peer-reviewed papers contradicting their ground-truth answers. Issues include confusion of species with orders, misreading of FDA guidelines, etc. All our notes are public. 5/7

July 23, 2025 at 4:29 PM

Andrew White 🐦‍⬛

@andrew.diffuse.one

The HLE rubric wanted questions to have “objectively correct, univocal” ground-truth answers. You can find multiple peer-reviewed papers that contradict the statement "Oganesson was the rarest noble gas in 2002 as a percentage of terrestrial matter" 4/7

July 23, 2025 at 4:29 PM

Andrew White 🐦‍⬛

@andrew.diffuse.one

It’s a clever question. But it’s not really about frontier science. Multiple papers have shown that Oganesson is not a gas (it’s predicted to be semiconducting solid), it’s not noble (it’s reactive), and it isn’t included in any "terrestrial matter" tables of noble gases. 3/7

July 23, 2025 at 4:29 PM

Andrew White 🐦‍⬛

@andrew.diffuse.one

The design process of HLE required the questions to be unanswerable by contemporary LLMs. That lead to many gotcha style questions like the one below. It’s a trick question – in 2002, a few atoms of a group 18 element Oganesson were made for a few milliseconds. 2/7

July 23, 2025 at 4:29 PM

Andrew White 🐦‍⬛

@andrew.diffuse.one

I just noticed it has sound lol. It's amazing

July 12, 2025 at 4:17 AM

Andrew White 🐦‍⬛

@andrew.diffuse.one

It may take a bit to extract the function, but here it is: github.com/Future-House...

ether0/src/ether0/rewards.py at c8cc676354e926b50ad206a606e04489bc9c95e3 · Future-House/ether0

A scientific reasoning model, dataset, and reward functions for chemistry. - Future-House/ether0

github.com

June 22, 2025 at 7:12 PM

Andrew White 🐦‍⬛

@andrew.diffuse.one

Although the discovery here is exciting, we are not claiming that we have cured dry AMD. Fully validating this hypothesis as a treatment for dry AMD will take human trials, which will take much longer.

Blog: www.futurehouse.org/research-ann...
Paper: arxiv.org/abs/2505.13400

Demonstrating end-to-end scientific discovery with Robin: a multi-agent system | FutureHouse

www.futurehouse.org

May 20, 2025 at 3:35 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news