Lightnews — Scholar-powered news

Chris Olah

@colah.bsky.social

6.4K followers 9 following 41 posts

Reverse engineering neural networks at Anthropic. Previously Distill, OpenAI, Google Brain.Personal account.

Posts Replies Media Videos

Reposted by Chris Olah

Nicholas Grossman

@nicholasgrossman.bsky.social

Political violence is bad. It usually begets more political violence.

Celebrating political violence is bad. It usually encourages more political violence, against various targets.

Campus shootings are bad. They make everyone on campus less safe.

It's bad that what I wrote here is controversial.

September 10, 2025 at 7:06 PM

Chris Olah

@colah.bsky.social

Applications for Anthropic AI Safety Fellows are due Aug 17!

US: job-boards.greenhouse.io/anthropic/jo...
UK: job-boards.greenhouse.io/anthropic/jo...
CA: job-boards.greenhouse.io/anthropic/jo...

It's a great opportunity to get mentorship and funding to work on safety for ~2 months.

Anthropic AI Safety Fellow, US

Remote-Friendly (Travel Required) | San Francisco, CA

job-boards.greenhouse.io

August 12, 2025 at 6:54 PM

Chris Olah

@colah.bsky.social

I've been talking about interference weights as a challenge for mechanistic interpretability for a while.

A short note discussing them - transformer-circuits.pub/2025/interfe...

A Toy Model of Interference Weights

transformer-circuits.pub

July 29, 2025 at 11:33 PM

Chris Olah

@colah.bsky.social

A number of people have asked me why we titled our recent paper "On the Biology of a Large Language Model".

Why call it "biology"?

May 13, 2025 at 7:34 PM

Chris Olah

@colah.bsky.social

The elegance of ML is the elegance of biology, not the elegance of math or physics.

Simple gradient descent creates mind-boggling structure and behavior, just as evolution creates the awe inspiring complexity of nature.

x.com/banburismus_...

x.com

May 13, 2025 at 7:32 PM

Chris Olah

@colah.bsky.social

The Anthropic Interpretability Team is planning a virtual Q&A to answer Qs about how we plan to make models safer, the role of the team at Anthropic, where we’re headed, and what it’s like to work here!

Please let us know if you’d be interested docs.google.com/forms/d/e/1F...

Interest in Attending a Virtual Q&A with Members of the Anthropic Interpretability Team

We’re planning on doing a virtual Q&A with members of the Anthropic Interpretability team in the near future. We’ll talk about and answer questions about the role of interpretability in making models ...

docs.google.com

May 8, 2025 at 7:55 PM

Chris Olah

@colah.bsky.social

Can we understand the mechanisms of a frontier AI model?

📝 Blog post: www.anthropic.com/research/tra...
🧪 "Biology" paper: transformer-circuits.pub/2025/attribu...
⚙️ Methods paper: transformer-circuits.pub/2025/attribu...

Featuring basic multi-step reasoning, planning, introspection and more!

On the Biology of a Large Language Model

transformer-circuits.pub

March 27, 2025 at 6:18 PM