Nathan Lambert
banner
natolambert.bsky.social
Nathan Lambert
@natolambert.bsky.social
A LLN - large language Nathan - (RL, RLHF, society, robotics), athlete, yogi, chef
Writes http://interconnects.ai
At Ai2 via HuggingFace, Berkeley, and normal places
Pinned
First draft online version of The RLHF Book is DONE. Recently I've been creating the advanced discussion chapters on everything from Constitutional AI to evaluation and character training, but I also sneak in consistent improvements to the RL specific chapter.

rlhfbook.com
First time at CMU
February 13, 2026 at 3:35 PM
Fun to set up real analytics and learn that my RLHF Book pdf is downloaded 50-100 times a day from my site (doesnt include Arxiv downloads/views).

Thanks for reading!
February 12, 2026 at 2:51 PM
Codex app is nice.
Im just a few minutes in and think it'll make some of the crazy things i was doing way easier to monitor.
February 11, 2026 at 11:38 PM
Poll. Do you see the famous METR plot holding true on Jan. 1st of 2027 (~20 hours), or 2028 (~50 hours).

What would be the right way to measure tasks of that scope?
February 11, 2026 at 5:12 PM
Beautiful RL scaling plot from Cursor.
cursor.com/blog/compose...
February 10, 2026 at 12:26 AM
TLDR: codex is a very useful coding tool, claude is the first agent.
I spent a long time testing the new Opus 4.6 and Codex 5.3 models, but the most striking thing was how so many people are reacting to model releases wrong with how we now use models. In my post-benchmark era.

Claude is still king, but codex is closer than ever
www.interconnects.ai/p/opus-46-vs...
Opus 4.6, Codex 5.3, and the post-benchmark era
On comparing models in 2026.
www.interconnects.ai
February 9, 2026 at 3:40 PM
I spent a long time testing the new Opus 4.6 and Codex 5.3 models, but the most striking thing was how so many people are reacting to model releases wrong with how we now use models. In my post-benchmark era.

Claude is still king, but codex is closer than ever
www.interconnects.ai/p/opus-46-vs...
Opus 4.6, Codex 5.3, and the post-benchmark era
On comparing models in 2026.
www.interconnects.ai
February 9, 2026 at 3:21 PM
People don't want to accept that the top used open model families in 2026 are.

Overall:
1. Qwen
2. Llama
3. GPT-OSS

Big models:
1. DeepSeek
2. GPT-OSS/Qwen/everyone else

Llama's inertia says a lot about how the ecosystem works.
February 8, 2026 at 5:45 PM
I want there to be a nanoGPT style speedrunning setup for RL.
February 6, 2026 at 7:29 PM
The best compliment i can give OpenAI's Codex 5.3 is that it feels way more like Claude Code
February 6, 2026 at 6:07 PM
GPT Codex 5.3 sounds like a much bigger change than Claude Opus 4.6, will be curious if this holds up in real testing.
February 5, 2026 at 6:31 PM
“Due to GPT‑5.3-Codex being so different from its predecessors, the data from alpha testing exhibited numerous unusual and counter-intuitive results”

Sounds worth giving a go. Big changes are good.
February 5, 2026 at 6:16 PM
Reposted by Nathan Lambert
Reward models (RMs) are supposed to represent human values. But RMs are NOT blank slates – they inherit measurable biases from their base models that stubbornly persist through preference training. #ICLR2026 🧵
February 4, 2026 at 4:30 PM
Ending your day at >99% Claude rate limit usage but not maxing out feels like a masterpiece.
February 5, 2026 at 3:32 AM
Nvidia’s Nemotron is the closest thing the U.S. has to a Qwen approach to open models, but most people don’t know it yet.
I’m very bullish on Nvidia’s open model efforts in 2026.
Interconnects interview #17 on the past, present, and future of the Nemotron project.
www.youtube.com/watch?v=Y3Vb...
Why NVIDIA builds their own open models | Nemotron w/ Bryan Catanzaro
NVIDIA releasing their best models as open weights isn't charity — it's a business decision. And honestly, it's one of the clearest explanations I've heard for why a company would invest heavily in…
www.youtube.com
February 4, 2026 at 6:05 PM
Qwen already dropping models for CNY
February 3, 2026 at 5:48 PM
Gemini not being in the conversation at all with Claude Code and Codex is the real “code red” emergency.
February 3, 2026 at 3:23 PM
Is documented! I did a full memory sweep. The training becomes FLOP limited before memory saturated.
February 2, 2026 at 8:36 PM
Latest open artifacts (#18): Arcee's 400B MoE, LiquidAI's underrated 1B model, new Kimi, and anticipation of a busy month
Tons of useful "niche" models and anticipation of big releases coming soon.
www.interconnects.ai/p/latest-ope...
Latest open artifacts (#18): Arcee, LiquidAI and Moonshot ...
Tons of useful "niche" models and anticipation of big releases coming soon.
www.interconnects.ai
February 2, 2026 at 3:23 PM
Despite being banned, Chinese users (likely via VPNs) are HuggingFace's top user group. They definitely have the most people *building* open models.
February 1, 2026 at 5:07 PM
claude code writing, codex code review, GPT Pro for planning made a working DPO (and related algorithms) repository from scratch for my RLHF book, and the curves are looking right.

On the dgx spark finetuning olmo 2 1b sft. Built by referencing the original repositories + TRL
February 1, 2026 at 3:41 PM
Recorded a podcast, think it’s pretty good and comprehensive, hope you like it ;) youtu.be/EV7WhVT270Q?...
State of AI in 2026: LLMs, Coding, Scaling Laws, China, Agents, GPUs, AGI | Lex Fridman Podcast #490
YouTube video by Lex Fridman
youtu.be
January 31, 2026 at 11:06 PM
I'm visiting CMU for a talk at the Language Technologies Institute on feb 12/13th. Looking forward to chatting with folks about frontiers in RL and building agentic language models.

Email me with "CMU Visit" in the subject if you're interested in chatting & why!
January 31, 2026 at 8:03 PM