Lightnews — Scholar-powered news

Reposted by Eric Todd

Hiba Ahsan

@hibaahsan.bsky.social

LLMs have been shown to provide different predictions in clinical tasks when patient race is altered. Can SAEs spot this undue reliance on race? 🧵

Work w/ @byron.bsky.social

Link: arxiv.org/abs/2511.00177

November 5, 2025 at 3:20 PM

Reposted by Eric Todd

Jennifer Hu

@jennhu.bsky.social

Interested in doing a PhD at the intersection of human and machine cognition? ✨ I'm recruiting students for Fall 2026! ✨

Topics of interest include pragmatics, metacognition, reasoning, & interpretability (in humans and AI).

Check out JHU's mentoring program (due 11/15) for help with your SoP 👇

JHU Cognitive Science @jhucogsci.bsky.social · 12d

The department of Cognitive Science @jhu.edu is seeking motivated students interested in joining our interdisciplinary PhD program! Applications due 1 Dec

Our PhD students also run an application mentoring program for prospective students. Mentoring requests due November 15.

tinyurl.com/2nrn4jf9

Call for applications to cognitive science PhD program with QR code to the link above

November 4, 2025 at 2:44 PM

Reposted by Eric Todd

Arnab Sen Sharma

@arnabsensharma.bsky.social

How can a language model find the veggies in a menu?

New pre-print where we investigate the internal mechanisms of LLMs when filtering on a list of options.

Spoiler: turns out LLMs use strategies surprisingly similar to functional programming (think "filter" from python)! 🧵

November 4, 2025 at 5:48 PM

Eric Todd

@ericwtodd.bsky.social

Looking forward to attending #COLM2025 this week! Would love to meet up and chat with others about interpretability + more. DMs are open if you want to connect. Be sure to checkout @sfeucht.bsky.social's very cool work on understanding concepts in LLMs tomorrow morning (Poster 35)!

Sheridan Feucht @sfeucht.bsky.social · Apr 7

[📄] Are LLMs mindless token-shifters, or do they build meaningful representations of language? We study how LLMs copy text in-context, and physically separate out two types of induction heads: token heads, which copy literal tokens, and concept heads, which copy word meanings.

October 6, 2025 at 3:00 PM

Reposted by Eric Todd

Aaron Mueller

@amuuueller.bsky.social

What's the right unit of analysis for understanding LLM internals? We explore in our mech interp survey (a major update from our 2024 ms).

We’ve added more recent work and more immediately actionable directions for future work. Now published in Computational Linguistics!

October 1, 2025 at 2:03 PM

Reposted by Eric Todd

David Bau

@davidbau.bsky.social

Who is going to be at #COLM2025?

I want to draw your attention to a COLM paper by my student @sfeucht.bsky.social that has totally changed the way I think and teach about LLM representations. The work is worth knowing.

And you can meet Sheridan at COLM, Oct 7!
bsky.app/profile/sfe...

September 27, 2025 at 8:54 PM

Reposted by Eric Todd

David Bau

@davidbau.bsky.social

Announcing a broad expansion of the National Deep Inference Fabric.

This could be relevant to your research...

September 26, 2025 at 6:47 PM

Reposted by Eric Todd

Chantal

@chantalsh.bsky.social

"AI slop" seems to be everywhere, but what exactly makes text feel like "slop"?

In our new work (w/ @tuhinchakr.bsky.social, Diego Garcia-Olano, @byron.bsky.social ) we provide a systematic attempt at measuring AI "slop" in text!

arxiv.org/abs/2509.19163

🧵 (1/7)

September 24, 2025 at 1:21 PM

Reposted by Eric Todd

Millicent Li

@millicentli.bsky.social

Wouldn’t it be great to have questions about LM internals answered in plain English? That’s the promise of verbalization interpretability. Unfortunately, our new paper shows that evaluating these methods is nuanced—and verbalizers might not tell us what we hope they do. 🧵👇1/8

September 17, 2025 at 7:19 PM

Reposted by Eric Todd

David Bau

@davidbau.bsky.social

This Friday NEMI 2025 is at Northeastern in Boston, 8 talks, 24 roundtables, 90 posters; 200+ attendees. Thanks to
goodfire.ai/ for sponsoring! nemiconf.github.io/summer25/

If you can't make it in person, the livestream will be here:
www.youtube.com/live/4BJBis...

New England Mechanistic Interpretability Workshop

About:The New England Mechanistic Interpretability (NEMI) workshop aims to bring together academic and industry researchers from the New England and surround...

www.youtube.com

August 18, 2025 at 6:06 PM

Reposted by Eric Todd

Sheridan Feucht

@sfeucht.bsky.social

We've added a quick new section to this paper, which was just accepted to @COLM_conf! By summing weights of concept induction heads, we created a "concept lens" that lets you read out semantic information in a model's hidden states. 🔎

Sheridan Feucht @sfeucht.bsky.social · Apr 7

[📄] Are LLMs mindless token-shifters, or do they build meaningful representations of language? We study how LLMs copy text in-context, and physically separate out two types of induction heads: token heads, which copy literal tokens, and concept heads, which copy word meanings.

July 22, 2025 at 12:40 PM

Eric Todd

@ericwtodd.bsky.social

Im excited for NEMI again this year! I’ve enjoyed local research meetups and getting to know others near me working on interesting problems.

Koyena Pal @koyena.bsky.social · Jun 30

🚨 Registration is live! 🚨

The New England Mechanistic Interpretability (NEMI) Workshop is happening Aug 22nd 2025 at Northeastern University!

A chance for the mech interp community to nerd out on how models really work 🧠🤖

🌐 Info: nemiconf.github.io/summer25/
📝 Register: forms.gle/v4kJCweE3UUH...

June 30, 2025 at 11:00 PM

Reposted by Eric Todd

Koyena Pal

@koyena.bsky.social

🚨 Registration is live! 🚨

The New England Mechanistic Interpretability (NEMI) Workshop is happening Aug 22nd 2025 at Northeastern University!

A chance for the mech interp community to nerd out on how models really work 🧠🤖

🌐 Info: nemiconf.github.io/summer25/
📝 Register: forms.gle/v4kJCweE3UUH...

June 30, 2025 at 10:55 PM

Reposted by Eric Todd

nikhil07prakash.bsky.social

@nikhil07prakash.bsky.social

How do language models track mental states of each character in a story, often referred to as Theory of Mind?

We reverse-engineered how LLaMA-3-70B-Instruct handles a belief-tracking task and found something surprising: it uses mechanisms strikingly similar to pointer variables in C programming!

June 24, 2025 at 5:13 PM

Reposted by Eric Todd

Can

@canrager.bsky.social

Can we uncover the list of topics a language model is censored on?

Refused topics vary strongly among models. Claude-3.5 vs DeepSeek-R1 refusal patterns:

June 13, 2025 at 3:59 PM

Reposted by Eric Todd

Sheridan Feucht

@sfeucht.bsky.social

I'll present a poster for this work at NENLP tomorrow! Come find me at poster #80...

Sheridan Feucht @sfeucht.bsky.social · Apr 7

[📄] Are LLMs mindless token-shifters, or do they build meaningful representations of language? We study how LLMs copy text in-context, and physically separate out two types of induction heads: token heads, which copy literal tokens, and concept heads, which copy word meanings.

April 10, 2025 at 9:19 PM

Reposted by Eric Todd

David Bau

@davidbau.bsky.social

Sheridan asks whether the Dual Route Model of Reading that psychologists have observed in humans also appears in LLMs.

In her brilliantly simple study of induction heads, she finds that it does! Induction has a Dual Route that separates concepts from literal token processing.

Worth reading ↘️

Sheridan Feucht @sfeucht.bsky.social · Apr 7

[📄] Are LLMs mindless token-shifters, or do they build meaningful representations of language? We study how LLMs copy text in-context, and physically separate out two types of induction heads: token heads, which copy literal tokens, and concept heads, which copy word meanings.

April 7, 2025 at 3:23 PM

Reposted by Eric Todd

Sheridan Feucht

@sfeucht.bsky.social

[📄] Are LLMs mindless token-shifters, or do they build meaningful representations of language? We study how LLMs copy text in-context, and physically separate out two types of induction heads: token heads, which copy literal tokens, and concept heads, which copy word meanings.

April 7, 2025 at 1:54 PM

Reposted by Eric Todd

David Bau

@davidbau.bsky.social

What will be the linchpin for AI dominance?

Read our NSF/OSTP recommendations written with Goodfire's Tom McGrath tommcgrath.github.io, Transluce's Sarah Schwettmann cogconfluence.com, MIT's Dylan Hadfield-Menell @dhadfieldmenell.bsky.social

TLDR; Dominance comes from **interpretability** 🧵 ↘️

March 16, 2025 at 1:57 PM

Reposted by Eric Todd

Chantal

@chantalsh.bsky.social

I'm searching for some comp/ling experts to provide a precise definition of “slop” as it refers to text (see: corp.oup.com/word-of-the-...)

I put together a google form that should take no longer than 10 minutes to complete: forms.gle/oWxsCScW3dJU...
If you can help, I'd appreciate your input! 🙏

Oxford Word of the Year 2024 - Oxford University Press

The Oxford Word of the Year 2024 is 'brain rot'. Discover more about the winner, our shortlist, and 20 years of words that reflect the world.

corp.oup.com

March 10, 2025 at 8:00 PM

Reposted by Eric Todd

Kayo Yin

@kayoyin.bsky.social

Induction heads are commonly associated with in-context learning, but are they the primary driver of ICL at scale?

We find that recently discovered "function vector" heads, which encode the ICL task, are the actual primary mechanisms behind few-shot ICL!

arxiv.org/abs/2502.14010
🧵👇

February 28, 2025 at 4:16 PM

Reposted by Eric Todd

Hiba Ahsan

@hibaahsan.bsky.social

LLMs are known to perpetuate social biases in clinical tasks. Can we locate and intervene upon LLM activations that encode patient demographics like gender and race? 🧵

Work w/ @arnabsensharma.bsky.social, @silvioamir.bsky.social, @davidbau.bsky.social, @byron.bsky.social

arxiv.org/abs/2502.13319

February 22, 2025 at 4:18 AM

Reposted by Eric Todd

NDIF Team

@ndif-team.bsky.social

Please help amplify ARBOR, a fantastic new research opportunity! If you’d like to start contributing, NDIF is now hosting DeepSeek R1 8B and 70B, open for all researchers to experiment on via our API.

Sign up for API access here: login.ndif.us

February 20, 2025 at 10:35 PM

Eric Todd

@ericwtodd.bsky.social

I'm excited about this new open research initiative! It kind of feels like this is how science is supposed to be done - collaborating and sharing ideas in the open. If you've thought about studying the mechanisms behind R1 & other reasoning models check it out!

David Bau @davidbau.bsky.social · Feb 20

Today we launch a new open research community

It is called ARBOR:
arborproject.github.io/

please join us.
bsky.app/profile/ajy...

February 20, 2025 at 11:15 PM

Reposted by Eric Todd

David Bau

@davidbau.bsky.social

DeepSeek R1 shows how important it is to be studying the internals of reasoning models. Try our code: Here @canrager.bsky.social shows a method for auditing AI bias by probing the internal monologue.

dsthoughts.baulab.info

I'd be interested in your thoughts.

dsthoughts.baulab

January 31, 2025 at 2:30 PM

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news