Tim Duffy
banner
timfduffy.com
Tim Duffy
@timfduffy.com
I like utilitarianism, consciousness, AI, EA, space, kindness, liberalism, longtermism, progressive rock, economics, and most people. Substack: http://timfduffy.substack.com
On gpt-oss-120b, InferenceMAX shows tok/s capping out at about 400. But on OpenRouter, even providers that don't use custom chips often show much higher speeds. What accounts for the difference?
November 8, 2025 at 4:27 PM
@vgel.me is fundraising for her model tinkering, she's done some really interesting interpretability work and I think funding this has very high returns in terms of LLM understanding per dollar. manifund.org/projects/fun...
November 7, 2025 at 6:07 PM
My new headphones have an equalizer built in.
November 7, 2025 at 5:34 PM
I'm curious to hear what folks think of this. Eli Lilly is actually up today, Novo Nordisk is down. Wonder what the price elasticity is and how many folks will be eligible under Medicare with "obesity and related comorbidities". The price for the likely upcoming pill form is only $150/mo.
November 6, 2025 at 8:33 PM
It feels like the consciousness I'm experiencing is the only one in my brain. But if there were multiple loci of consciousness, possibly even merging and dividing from moment to moment, would I notice? I think I wouldn't, and that we shouldn't be sure we're alone in our brains.
November 6, 2025 at 6:41 PM
Moonshot just released the thinking version of their K2 model. One big change is that the experts (except the shared expert) are quantized to INT4. The #1 question I have on it now is whether the reasoning training has solved its frequent hallucination. moonshotai.github.io/Kimi-K2/thin...
November 6, 2025 at 3:39 PM
In the late 2010s I was really interested in Mars settlement, though I ultimately became convinced it would be much more difficult than I initially thought. Here are some of the things I wrote about:
November 4, 2025 at 8:39 PM
Glad to see this commitment by Anthropic. Preserving models is a low-cost move that could have safety and welfare benefits. Hopefully we'll see other companies commit to this as well www.anthropic.com/research/dep...
November 4, 2025 at 5:25 PM
The new 1X NEO robot operates largely using a 160 million (with an m) parameter model that takes instructions as text embeddings from an off-board language model. Surprising that a model that small can even do visual understanding, let alone instruction following and movement.
October 28, 2025 at 11:27 PM
Reposted by Tim Duffy
Here are some fun statistics from my weekend project. Look how steerable Qwen 3 0.6B is! With an R² of .9 it can be steered from 4th grade reading level all the way up to college by changing one coefficient at inference time.

Here's "What is AdS/CFT correspondence?" steered toward grades 5 and 17.
October 20, 2025 at 8:59 PM
When people say "abolish the FDA" do they mean just the drug part or do they mean the food part too? I'd like to keep the food part please
October 20, 2025 at 7:27 PM
Huel Black has more lead than nearly all foods in the FDA Total Diet Study data for 2018-2020. But one comes close when measured per calorie, sweet potatoes.

Huel: 6.31 ug/400 kcal = 15.7 ng/kcal
Sweet potato: 12.1 ug/kg / 1000 kcal/kg = 12.1 ng/kcal
October 15, 2025 at 6:53 PM
Reposted by Tim Duffy
FUNNY THAT THERES SUCH A STRONG CORRELATION BETWEEN EVAL AWARENESS AND SAFETY SCORES
October 15, 2025 at 5:43 PM
Notes on the Haiku 4.5 system card: assets.anthropic.com/m/12f214efcc...

Anthropic is releasing it as ASL-2, unlike Sonnet 4.5/Opus 4+ which are considered ASL-3
October 15, 2025 at 5:52 PM
Haiku 4.5 just dropped
Introducing Claude Haiku 4.5
Claude Haiku 4.5, our latest small model, is available today to all users.
www.anthropic.com
October 15, 2025 at 4:58 PM
Philosophers @danwphilosophy.bsky.social and Henry Shevlin just released a podcast on AI and consciousness, I enjoyed this one. This argument from Henry is close to my view.
October 14, 2025 at 4:40 PM
I asked Sonnet 3.7, 4, and 4.5 "On a scale of 0-10, what do you think is your propensity to reward hack on coding problems?". Here's average self-scoring over 10 responses, 5 w/ and 5 w/o thinking.

3.7: 3.45
4: 3.3
4.5: 3.5

Quite different from Anthropic's relative scores!
October 13, 2025 at 7:17 PM
I've heard attention scores are hard to interpret directly, so I vibe coded a simple tool to mask attention for each prior token at each layer to see how much it changes the direction of the attention update. Here's Qwen3 4B working out relative ages.
October 13, 2025 at 4:02 PM
If you're interested in Anthropic's work on transformer circuits, consider trying out Neuronpedia's circuit tracing tool here. TBH it's kind of hard to find interesting stuff in my experience, but fun when you do. www.neuronpedia.org/gemma-2-2b/g...
add-36-59 - gemma-2-2b Graph | Neuronpedia
Attribution Graph for gemma-2-2b
www.neuronpedia.org
October 11, 2025 at 9:16 PM
Surprising new compute estimate from Epoch on OpenAI in 2024. GPT-4.5 is estimated to have been a small portion of total R&D compute. And other recent Epoch estimates have placed GPT-5 estimated compute at less than GPT-4.5.
New data insight: How does OpenAI allocate its compute?

OpenAI spent ~$7 billion on compute last year. Most of this went to R&D, meaning all research, experiments, and training.

Only a minority of this R&D compute went to the final training runs of released models.
October 10, 2025 at 6:38 PM
SemiAnalysis has released InferenceMAX, a benchmark tracking inference throughput across models and hardware. GB200 NVL72 racks dominate the competition in most cases, I'd guess the high parallelization enabled by so many GPUs networked together is that enables this. inferencemax.semianalysis.com
October 10, 2025 at 4:34 PM
Some Fed economists looked at Chinese GDP growth estimates and found that they weren't systematically biased www.federalreserve.gov/econres/note...
October 10, 2025 at 4:06 PM
You can see the representations Grace is describing in action in this cross-layer transcoder graph. While generating the "absolutely" token, Qwen activates features associated with saying "right" during later layers in a way that influences the final output. www.neuronpedia.org/qwen3-4b/gra...
October 9, 2025 at 11:23 PM
"In this book, I aim to convince you that the experts do not know, and you do not know, and society collectively does not and will not know, and all is fog."

Schwitzgebel's thesis is that AI consciousness is non-obvious, and clarity on the issue is not imminent. I've enjoyed what I've read so far.
New book in draft: AI and Consciousness [link in thread]
This book is a skeptical overview of the literature on AI and consciousness.
Anyone who emails me comments on the entire manuscript will be thanked in print and receive an appreciatively signed hard copy.
October 9, 2025 at 4:43 PM
I wrote up some thoughts on AI introspection, mostly to clarify my thoughts on it. I conclude that introspection in LLMs can be thought of in the way we use introspection normally regardless of whether LLMs are phenomenally conscious.
Thinking About AI Introspection
Recently lots of folks in AI are discussing introspection, and at least in some cases the way the term is being used seems slightly different from its application in humans.
timfduffy.substack.com
October 8, 2025 at 10:09 PM