David Bau
@davidbau.bsky.social
Interpretable Deep Networks. http://baulab.info/ @davidbau
The secret life of an LM is defined by its internal data types. Inner layers transport abstractions that are more robust than words, like concepts, functions, or pointers.
In new work yesterday, @arnabsensharma.bsky.social et al identify a data type for *predicates*.
bsky.app/profile/arn...
In new work yesterday, @arnabsensharma.bsky.social et al identify a data type for *predicates*.
bsky.app/profile/arn...
Arnab Sen Sharma (@arnabsensharma.bsky.social)
How can a language model find the veggies in a menu?
New pre-print where we investigate the internal mechanisms of LLMs when filtering on a list of options.
Spoiler: turns out LLMs use strategies surprisingly similar to functional programming (think "filter" from python)! 🧵
bsky.app
November 6, 2025 at 2:00 PM
The secret life of an LM is defined by its internal data types. Inner layers transport abstractions that are more robust than words, like concepts, functions, or pointers.
In new work yesterday, @arnabsensharma.bsky.social et al identify a data type for *predicates*.
bsky.app/profile/arn...
In new work yesterday, @arnabsensharma.bsky.social et al identify a data type for *predicates*.
bsky.app/profile/arn...
What does an LLM do when it translates from Italian "amore" to Spanish "amor" or French "amour"?
That's easy! (you might think) Because surely it knows: amore, amor, amour are all based on the same Latin word. It can just drop the "e", or add a "u".
That's easy! (you might think) Because surely it knows: amore, amor, amour are all based on the same Latin word. It can just drop the "e", or add a "u".
October 11, 2025 at 12:02 PM
What does an LLM do when it translates from Italian "amore" to Spanish "amor" or French "amour"?
That's easy! (you might think) Because surely it knows: amore, amor, amour are all based on the same Latin word. It can just drop the "e", or add a "u".
That's easy! (you might think) Because surely it knows: amore, amor, amour are all based on the same Latin word. It can just drop the "e", or add a "u".
Looking forward to #COLM2025 tomorrow. DM me if you'll also be there and want to meet to chat.
Who is going to be at #COLM2025?
I want to draw your attention to a COLM paper by my student @sfeucht.bsky.social that has totally changed the way I think and teach about LLM representations. The work is worth knowing.
And you can meet Sheridan at COLM, Oct 7!
bsky.app/profile/sfe...
I want to draw your attention to a COLM paper by my student @sfeucht.bsky.social that has totally changed the way I think and teach about LLM representations. The work is worth knowing.
And you can meet Sheridan at COLM, Oct 7!
bsky.app/profile/sfe...
October 6, 2025 at 12:10 PM
Looking forward to #COLM2025 tomorrow. DM me if you'll also be there and want to meet to chat.
There are a lot of interesting details that surface when you use SAEs to understand and control diffusion image synthesis models. Learn more in @wendlerc.bsky.social's talk.
New YouTube video posted! @wendlerc.bsky.social presents his work using SAEs for diffusion text-to-image models. The authors find interpretable SAE features and demonstrate how these features can alter generated images.
Watch here: youtu.be/43NnaqGjArA
Watch here: youtu.be/43NnaqGjArA
Interpreting SDXL Turbo Using Sparse Autoencoders with Chris Wendler
In this talk, Chris Wendler presents his recent work on using sparse autoencoders for diffusion models. In this work, they train SAEs on SDXL Turbo, finding ...
www.youtube.com
October 3, 2025 at 6:52 PM
There are a lot of interesting details that surface when you use SAEs to understand and control diffusion image synthesis models. Learn more in @wendlerc.bsky.social's talk.
On the Good Fight podcast w substack.com/@yaschamounk I give a quick but careful primer on how modern AI works.
I also chat about our responsibility as machine learning scientists, and what we need to fix to get AI right.
Take a listen and reshare -
www.persuasion.community/p/david-bau
I also chat about our responsibility as machine learning scientists, and what we need to fix to get AI right.
Take a listen and reshare -
www.persuasion.community/p/david-bau
David Bau on How Artificial Intelligence Works
Yascha Mounk and David Bau delve into the “black box” of AI.
www.persuasion.community
October 3, 2025 at 8:58 AM
On the Good Fight podcast w substack.com/@yaschamounk I give a quick but careful primer on how modern AI works.
I also chat about our responsibility as machine learning scientists, and what we need to fix to get AI right.
Take a listen and reshare -
www.persuasion.community/p/david-bau
I also chat about our responsibility as machine learning scientists, and what we need to fix to get AI right.
Take a listen and reshare -
www.persuasion.community/p/david-bau
I love the 'opinionated' approach taken by Aaron + team in this survey. It captures the ongoing work around the central casual puzzles we face in mechanistic interpretability.
What's the right unit of analysis for understanding LLM internals? We explore in our mech interp survey (a major update from our 2024 ms).
We’ve added more recent work and more immediately actionable directions for future work. Now published in Computational Linguistics!
We’ve added more recent work and more immediately actionable directions for future work. Now published in Computational Linguistics!
October 1, 2025 at 2:25 PM
I love the 'opinionated' approach taken by Aaron + team in this survey. It captures the ongoing work around the central casual puzzles we face in mechanistic interpretability.
Who is going to be at #COLM2025?
I want to draw your attention to a COLM paper by my student @sfeucht.bsky.social that has totally changed the way I think and teach about LLM representations. The work is worth knowing.
And you can meet Sheridan at COLM, Oct 7!
bsky.app/profile/sfe...
I want to draw your attention to a COLM paper by my student @sfeucht.bsky.social that has totally changed the way I think and teach about LLM representations. The work is worth knowing.
And you can meet Sheridan at COLM, Oct 7!
bsky.app/profile/sfe...
September 27, 2025 at 8:54 PM
Who is going to be at #COLM2025?
I want to draw your attention to a COLM paper by my student @sfeucht.bsky.social that has totally changed the way I think and teach about LLM representations. The work is worth knowing.
And you can meet Sheridan at COLM, Oct 7!
bsky.app/profile/sfe...
I want to draw your attention to a COLM paper by my student @sfeucht.bsky.social that has totally changed the way I think and teach about LLM representations. The work is worth knowing.
And you can meet Sheridan at COLM, Oct 7!
bsky.app/profile/sfe...
Announcing a broad expansion of the National Deep Inference Fabric.
This could be relevant to your research...
This could be relevant to your research...
September 26, 2025 at 6:47 PM
Announcing a broad expansion of the National Deep Inference Fabric.
This could be relevant to your research...
This could be relevant to your research...
The NDIF youtube talk series continues... Don't miss the fascinating talks on by Xu Pan and Josh Engels, on the NDIF youtube channel.
www.youtube.com/channel/UCaQ...
www.youtube.com/channel/UCaQ...
September 20, 2025 at 7:20 PM
The NDIF youtube talk series continues... Don't miss the fascinating talks on by Xu Pan and Josh Engels, on the NDIF youtube channel.
www.youtube.com/channel/UCaQ...
www.youtube.com/channel/UCaQ...
In the wake of the Jimmy Kimmel firing: Do not underestimate the power of the truth.
The truth is our superpower.
davidbau.com/archives/202...
The truth is our superpower.
davidbau.com/archives/202...
davidbau.com The Truth is Our Superpower
davidbau.com
September 20, 2025 at 7:17 PM
In the wake of the Jimmy Kimmel firing: Do not underestimate the power of the truth.
The truth is our superpower.
davidbau.com/archives/202...
The truth is our superpower.
davidbau.com/archives/202...
Reposted by David Bau
Monday: Trump tries to fire Fed Governor Lisa Cook (first time in 111 years).
Thursday: CDC chief dismissed, four top scientists resign.
Discredit, dismiss, blame.
History shows exactly where this three-step pattern leads.
Thursday: CDC chief dismissed, four top scientists resign.
Discredit, dismiss, blame.
History shows exactly where this three-step pattern leads.
August 29, 2025 at 2:04 AM
Monday: Trump tries to fire Fed Governor Lisa Cook (first time in 111 years).
Thursday: CDC chief dismissed, four top scientists resign.
Discredit, dismiss, blame.
History shows exactly where this three-step pattern leads.
Thursday: CDC chief dismissed, four top scientists resign.
Discredit, dismiss, blame.
History shows exactly where this three-step pattern leads.
Monday: Trump tries to fire Fed Governor Lisa Cook (first time in 111 years).
Thursday: CDC chief dismissed, four top scientists resign.
Discredit, dismiss, blame.
History shows exactly where this three-step pattern leads.
Thursday: CDC chief dismissed, four top scientists resign.
Discredit, dismiss, blame.
History shows exactly where this three-step pattern leads.
August 29, 2025 at 2:04 AM
Monday: Trump tries to fire Fed Governor Lisa Cook (first time in 111 years).
Thursday: CDC chief dismissed, four top scientists resign.
Discredit, dismiss, blame.
History shows exactly where this three-step pattern leads.
Thursday: CDC chief dismissed, four top scientists resign.
Discredit, dismiss, blame.
History shows exactly where this three-step pattern leads.
This Friday NEMI 2025 is at Northeastern in Boston, 8 talks, 24 roundtables, 90 posters; 200+ attendees. Thanks to
goodfire.ai/ for sponsoring! nemiconf.github.io/summer25/
If you can't make it in person, the livestream will be here:
www.youtube.com/live/4BJBis...
goodfire.ai/ for sponsoring! nemiconf.github.io/summer25/
If you can't make it in person, the livestream will be here:
www.youtube.com/live/4BJBis...
New England Mechanistic Interpretability Workshop
About:The New England Mechanistic Interpretability (NEMI) workshop aims to bring together academic and industry researchers from the New England and surround...
www.youtube.com
August 18, 2025 at 6:06 PM
This Friday NEMI 2025 is at Northeastern in Boston, 8 talks, 24 roundtables, 90 posters; 200+ attendees. Thanks to
goodfire.ai/ for sponsoring! nemiconf.github.io/summer25/
If you can't make it in person, the livestream will be here:
www.youtube.com/live/4BJBis...
goodfire.ai/ for sponsoring! nemiconf.github.io/summer25/
If you can't make it in person, the livestream will be here:
www.youtube.com/live/4BJBis...
Announcing a deep net interpretability talk series!
Every week you will find new talks on recent research in the science of neural networks. The first few are posted: jackmerullo.bsky.social, Roy Rinberg, and me.
At the @ndif-team.bsky.social Youtube Channel: www.youtube.com/@NDIFTeam
Every week you will find new talks on recent research in the science of neural networks. The first few are posted: jackmerullo.bsky.social, Roy Rinberg, and me.
At the @ndif-team.bsky.social Youtube Channel: www.youtube.com/@NDIFTeam
NDIF Team
We're a research computing project cracking open the mysteries inside large-scale AI systems.
The NSF National Deep Inference Fabric consists of a unique combination of hardware and software that provides a remotely-accessible computing resource for scientists and students to perform detailed and reproducible experiments on large pretrained AI models, such as open large language models.
We aim to make AI interpretability research more accessible through this channel by publishing lectures and educational content covering real interpretability research.
www.youtube.com
August 18, 2025 at 6:02 PM
Announcing a deep net interpretability talk series!
Every week you will find new talks on recent research in the science of neural networks. The first few are posted: jackmerullo.bsky.social, Roy Rinberg, and me.
At the @ndif-team.bsky.social Youtube Channel: www.youtube.com/@NDIFTeam
Every week you will find new talks on recent research in the science of neural networks. The first few are posted: jackmerullo.bsky.social, Roy Rinberg, and me.
At the @ndif-team.bsky.social Youtube Channel: www.youtube.com/@NDIFTeam
The New England Mechanistic Interpretability Workshop, NEMI 2025 is August 22 in Boston.
Talks, posters, meals, discussion... Most of all, an excellent chance to chat about new ideas with other great researchers in the field!
Help spread the word - register and repost -
bsky.app/profile/koy...
Talks, posters, meals, discussion... Most of all, an excellent chance to chat about new ideas with other great researchers in the field!
Help spread the word - register and repost -
bsky.app/profile/koy...
Koyena Pal (@koyena.bsky.social)
🚨 Registration is live! 🚨
The New England Mechanistic Interpretability (NEMI) Workshop is happening Aug 22nd 2025 at Northeastern University!
A chance for the mech interp community to nerd out on how models really work 🧠🤖
🌐 Info: nemiconf.github.io/summer25/
📝 Register: https://forms.gle/v4kJCweE3UUHUE81A
bsky.app
July 1, 2025 at 3:00 PM
The New England Mechanistic Interpretability Workshop, NEMI 2025 is August 22 in Boston.
Talks, posters, meals, discussion... Most of all, an excellent chance to chat about new ideas with other great researchers in the field!
Help spread the word - register and repost -
bsky.app/profile/koy...
Talks, posters, meals, discussion... Most of all, an excellent chance to chat about new ideas with other great researchers in the field!
Help spread the word - register and repost -
bsky.app/profile/koy...
The new "Lookback" paper from @nikhil07prakash.bsky.social contains a surprising insight...
70b/405b LLMs use double pointers, akin to C programmers' double (**) pointers. They show up when the LLM is "knowing what Sally knows Ann knows", i.e., Theory of Mind.
bsky.app/profile/nik...
70b/405b LLMs use double pointers, akin to C programmers' double (**) pointers. They show up when the LLM is "knowing what Sally knows Ann knows", i.e., Theory of Mind.
bsky.app/profile/nik...
@nikhil07prakash.bsky.social
How do language models track mental states of each character in a story, often referred to as Theory of Mind?
We reverse-engineered how LLaMA-3-70B-Instruct handles a belief-tracking task and found something surprising: it uses mechanisms strikingly similar to pointer variables in C programming!
bsky.app
June 25, 2025 at 3:00 PM
The new "Lookback" paper from @nikhil07prakash.bsky.social contains a surprising insight...
70b/405b LLMs use double pointers, akin to C programmers' double (**) pointers. They show up when the LLM is "knowing what Sally knows Ann knows", i.e., Theory of Mind.
bsky.app/profile/nik...
70b/405b LLMs use double pointers, akin to C programmers' double (**) pointers. They show up when the LLM is "knowing what Sally knows Ann knows", i.e., Theory of Mind.
bsky.app/profile/nik...
FRIENDS: American science is being decimated by Congress NOW.
Your help is needed to fix this. The current DC plan PERMANENTLY slashes NSF, NIH, all science training. Money isn't redirected—it's gone.
Please read+share what's happening
thevisible.net/posts/004-s...
Your help is needed to fix this. The current DC plan PERMANENTLY slashes NSF, NIH, all science training. Money isn't redirected—it's gone.
Please read+share what's happening
thevisible.net/posts/004-s...
June 3, 2025 at 4:15 PM
FRIENDS: American science is being decimated by Congress NOW.
Your help is needed to fix this. The current DC plan PERMANENTLY slashes NSF, NIH, all science training. Money isn't redirected—it's gone.
Please read+share what's happening
thevisible.net/posts/004-s...
Your help is needed to fix this. The current DC plan PERMANENTLY slashes NSF, NIH, all science training. Money isn't redirected—it's gone.
Please read+share what's happening
thevisible.net/posts/004-s...
Because of propaganda Americans do not understand what Rubio is doing with visas. "I gave you a visa to come and study," they think.
x.com/CitizenFree...
NO, he has not!! Please help explain to X how Rubio has stopped *ALL* student visas, and how it is killing US science.
x.com/CitizenFree...
NO, he has not!! Please help explain to X how Rubio has stopped *ALL* student visas, and how it is killing US science.
May 29, 2025 at 10:27 AM
Because of propaganda Americans do not understand what Rubio is doing with visas. "I gave you a visa to come and study," they think.
x.com/CitizenFree...
NO, he has not!! Please help explain to X how Rubio has stopped *ALL* student visas, and how it is killing US science.
x.com/CitizenFree...
NO, he has not!! Please help explain to X how Rubio has stopped *ALL* student visas, and how it is killing US science.
When setting up my AI lab I faced a choice between Toronto and Boston. I chose Boston, my home and the world's best incubator for research talent.
Here you can take a short stroll to meet with top minds in hundreds of fields from AI to astronomy, batteries to biotech.
Here you can take a short stroll to meet with top minds in hundreds of fields from AI to astronomy, batteries to biotech.
May 28, 2025 at 1:29 PM
When setting up my AI lab I faced a choice between Toronto and Boston. I chose Boston, my home and the world's best incubator for research talent.
Here you can take a short stroll to meet with top minds in hundreds of fields from AI to astronomy, batteries to biotech.
Here you can take a short stroll to meet with top minds in hundreds of fields from AI to astronomy, batteries to biotech.
Black Box, Blood Money
Friday evening, an Italian tourist escaped a torturer in Manhattan who was after his crypto password. I asked Anthropic's Opus 4 to analyze and explain what the episode might teach us about AI.
It critiqued my guidance, instead proposing a focus on VCs:
Friday evening, an Italian tourist escaped a torturer in Manhattan who was after his crypto password. I asked Anthropic's Opus 4 to analyze and explain what the episode might teach us about AI.
It critiqued my guidance, instead proposing a focus on VCs:
May 25, 2025 at 1:12 PM
Black Box, Blood Money
Friday evening, an Italian tourist escaped a torturer in Manhattan who was after his crypto password. I asked Anthropic's Opus 4 to analyze and explain what the episode might teach us about AI.
It critiqued my guidance, instead proposing a focus on VCs:
Friday evening, an Italian tourist escaped a torturer in Manhattan who was after his crypto password. I asked Anthropic's Opus 4 to analyze and explain what the episode might teach us about AI.
It critiqued my guidance, instead proposing a focus on VCs:
Please join me in celebrating the contributions of our international students, researchers, and visitors.
Here is a reminder of what makes America unique, and why no other nation can touch USA's 420 Nobel prizes:
Here is a reminder of what makes America unique, and why no other nation can touch USA's 420 Nobel prizes:
May 23, 2025 at 4:29 AM
Please join me in celebrating the contributions of our international students, researchers, and visitors.
Here is a reminder of what makes America unique, and why no other nation can touch USA's 420 Nobel prizes:
Here is a reminder of what makes America unique, and why no other nation can touch USA's 420 Nobel prizes:
How to build AI leadership in the U.S?
It is not about the chips. It is about the people!
I spoke about AI interpretability at ntird.gov/ last week. (NTIRD is the joint program between 23 federal agencies that coordinates government technology investments.)
It is not about the chips. It is about the people!
I spoke about AI interpretability at ntird.gov/ last week. (NTIRD is the joint program between 23 federal agencies that coordinates government technology investments.)
May 17, 2025 at 4:10 PM
How to build AI leadership in the U.S?
It is not about the chips. It is about the people!
I spoke about AI interpretability at ntird.gov/ last week. (NTIRD is the joint program between 23 federal agencies that coordinates government technology investments.)
It is not about the chips. It is about the people!
I spoke about AI interpretability at ntird.gov/ last week. (NTIRD is the joint program between 23 federal agencies that coordinates government technology investments.)
My grandfather was a WW2 American Army veteran who became a proud cataloger at the Library of Congress. As a toddler I remember walking to his Library office from his A Street home, with a stop at the playground.
The LOC has always been the jewel of America for me.
The LOC has always been the jewel of America for me.
May 10, 2025 at 3:00 PM
My grandfather was a WW2 American Army veteran who became a proud cataloger at the Library of Congress. As a toddler I remember walking to his Library office from his A Street home, with a stop at the playground.
The LOC has always been the jewel of America for me.
The LOC has always been the jewel of America for me.
Leon Bottou's post ICLR thoughts are worth a read.
He reminds us that modern AI is not just a product: it is a scientific wonder that we still do not understand.
leon.bottou.org/news/two_les...
He reminds us that modern AI is not just a product: it is a scientific wonder that we still do not understand.
leon.bottou.org/news/two_les...
news:two_lessons_from_iclr_2025 [leon.bottou.org]
leon.bottou.org
May 3, 2025 at 11:07 AM
Leon Bottou's post ICLR thoughts are worth a read.
He reminds us that modern AI is not just a product: it is a scientific wonder that we still do not understand.
leon.bottou.org/news/two_les...
He reminds us that modern AI is not just a product: it is a scientific wonder that we still do not understand.
leon.bottou.org/news/two_les...
In academia, we treat too many of our customs as "awards to be won" rather than obligations to the community.
It seems wrong that "getting a paper through peer review" is seen by many like an award, a game to win, rather than a duty.
Peer review forces writers and readers to *teach* each other.
It seems wrong that "getting a paper through peer review" is seen by many like an award, a game to win, rather than a duty.
Peer review forces writers and readers to *teach* each other.
In AI we've seen the rise of blog publishing outside of peer review (even outside arXiv).
Generally cool, but the loss of a standard form makes me miss it: many posts are too small, too big, or disconnected from prevailing knowledge.
Peer-review builds community by forcing you to read and be read.
Generally cool, but the loss of a standard form makes me miss it: many posts are too small, too big, or disconnected from prevailing knowledge.
Peer-review builds community by forcing you to read and be read.
April 23, 2025 at 7:22 AM
In academia, we treat too many of our customs as "awards to be won" rather than obligations to the community.
It seems wrong that "getting a paper through peer review" is seen by many like an award, a game to win, rather than a duty.
Peer review forces writers and readers to *teach* each other.
It seems wrong that "getting a paper through peer review" is seen by many like an award, a game to win, rather than a duty.
Peer review forces writers and readers to *teach* each other.