Chris Olah
banner
colah.bsky.social
Chris Olah
@colah.bsky.social
Reverse engineering neural networks at Anthropic. Previously Distill, OpenAI, Google Brain.Personal account.
Reposted by Chris Olah
Political violence is bad. It usually begets more political violence.

Celebrating political violence is bad. It usually encourages more political violence, against various targets.

Campus shootings are bad. They make everyone on campus less safe.

It's bad that what I wrote here is controversial.
September 10, 2025 at 7:06 PM
Applications for Anthropic AI Safety Fellows are due Aug 17!

US: job-boards.greenhouse.io/anthropic/jo...
UK: job-boards.greenhouse.io/anthropic/jo...
CA: job-boards.greenhouse.io/anthropic/jo...

It's a great opportunity to get mentorship and funding to work on safety for ~2 months.
Anthropic AI Safety Fellow, US
Remote-Friendly (Travel Required) | San Francisco, CA
job-boards.greenhouse.io
August 12, 2025 at 6:54 PM
I've been talking about interference weights as a challenge for mechanistic interpretability for a while.

A short note discussing them - transformer-circuits.pub/2025/interfe...
A Toy Model of Interference Weights
transformer-circuits.pub
July 29, 2025 at 11:33 PM
A number of people have asked me why we titled our recent paper "On the Biology of a Large Language Model".

Why call it "biology"?
May 13, 2025 at 7:34 PM
The elegance of ML is the elegance of biology, not the elegance of math or physics.

Simple gradient descent creates mind-boggling structure and behavior, just as evolution creates the awe inspiring complexity of nature.

x.com/banburismus_...
x.com
May 13, 2025 at 7:32 PM
The Anthropic Interpretability Team is planning a virtual Q&A to answer Qs about how we plan to make models safer, the role of the team at Anthropic, where we’re headed, and what it’s like to work here!

Please let us know if you’d be interested docs.google.com/forms/d/e/1F...
Interest in Attending a Virtual Q&A with Members of the Anthropic Interpretability Team
We’re planning on doing a virtual Q&A with members of the Anthropic Interpretability team in the near future. We’ll talk about and answer questions about the role of interpretability in making models ...
docs.google.com
May 8, 2025 at 7:55 PM
Can we understand the mechanisms of a frontier AI model?

📝 Blog post: www.anthropic.com/research/tra...
🧪 "Biology" paper: transformer-circuits.pub/2025/attribu...
⚙️ Methods paper: transformer-circuits.pub/2025/attribu...

Featuring basic multi-step reasoning, planning, introspection and more!
On the Biology of a Large Language Model
transformer-circuits.pub
March 27, 2025 at 6:18 PM
I've verified this account from my original Twitter account here - https://twitter.com/ch402/status/1678198774349307905
Tweet by @ch402
“@moultano Thanks everyone who shared invites! I'm now also on Blue Sky as "colah"!”
twitter.com
July 10, 2023 at 12:26 AM