Lightnews — Scholar-powered news

Sheridan Feucht

@sfeucht.bsky.social

220 followers 320 following 25 posts

PhD student doing LLM interpretability with @davidbau.bsky.social and @byron.bsky.social. (they/them) https://sfeucht.github.io

Posts Replies Media Videos

Sheridan Feucht

@sfeucht.bsky.social

If we do the same for token induction heads, we can also get a "token lens", which reads out surface-level token information from states. Unlike raw logit lens, which reveals next-token predictions, "token lens" reveals the current token.

"Token lens" outputs for the token "card" in the context "in the morning air, she heard northern card.inals."

July 22, 2025 at 12:40 PM

Sheridan Feucht

@sfeucht.bsky.social

If we apply concept lens to the word "cardinals" in three contexts, we see that Llama-2-7b has encoded this word very differently in each case!

Three "concept lens" outputs, showing the top-5 highest probability tokens when a hidden state (throughout different layers) is transformed by concept lens and projected to token space. There are three sentences, each with different predictions: "he was a lifelong fan of the cardinals", for which concept lens predicts "football" and "baseball"; "the secret meeting of the cardinals", for which concept lens predicts "Catholic"; and "in the morning air, she hear northern cardinals", which projects to "birds."

July 22, 2025 at 12:40 PM

Sheridan Feucht

@sfeucht.bsky.social

I'm on the train right now and just finished reading this paper for the first time--I actually just logged back on to bsky just so that I could link to it, but you beat me to the punch!

I really enjoyed your paper. This example was particularly great.

April 25, 2025 at 8:01 PM

Sheridan Feucht

@sfeucht.bsky.social

Yin & Steinhardt (2025) recently showed that FV heads are more important for ICL than token induction heads. But for translation, *concept* induction heads matter too! They copy forward word meanings, whereas FV heads influence the output language.
bsky.app/profile/kay...

April 7, 2025 at 1:54 PM

Sheridan Feucht

@sfeucht.bsky.social

Concept heads also output language-agnostic word representations. If we patch the outputs of these heads from one translation prompt to another, we can change the *meaning* of the outputted word, without changing the language. (see prior work from @butanium.bsky.social and @wendlerc.bsky.social)

April 7, 2025 at 1:54 PM

Sheridan Feucht

@sfeucht.bsky.social

Token induction heads are still important, though. When we ablate them over long sequences, models start to paraphrase instead of copying. We take this to mean that token induction heads are responsible for *exact* copying (which concept induction heads apparently can't do).

April 7, 2025 at 1:54 PM

Sheridan Feucht

@sfeucht.bsky.social

But how do we know these heads copy semantics? When we ablate concept induction heads, performance drops drastically for translation, synonyms, and antonyms: all tasks that require copying *meaning*, not just literal tokens.

April 7, 2025 at 1:54 PM

Sheridan Feucht

@sfeucht.bsky.social

Previous work showed that token induction heads attend to the next token to be copied (*window*pane). Analogously, we find that concept induction heads attend to the end of the next multi-token word to be copied (windowp*ane*).

April 7, 2025 at 1:54 PM

Sheridan Feucht

@sfeucht.bsky.social

--using causal interventions. Essentially, we pick out all of the attention heads that are responsible for promoting future entity tokens (e.g. "ax" in "waxwing"). We hypothesize that heads carrying an entire entity actually represent the *meaning* of that chunk of tokens.

April 7, 2025 at 1:54 PM

Sheridan Feucht

@sfeucht.bsky.social

There are multiple ways to copy text! Copying a wifi password like hxioW2qN52 is different than copying a meaningful one like OwlDoorGlass. Nonsense copying requires each char to be transferred one-by-one, but meaningful words can be copied all at once. Turns out, LLMs do both.

April 7, 2025 at 1:54 PM

Sheridan Feucht

@sfeucht.bsky.social

[📄] Are LLMs mindless token-shifters, or do they build meaningful representations of language? We study how LLMs copy text in-context, and physically separate out two types of induction heads: token heads, which copy literal tokens, and concept heads, which copy word meanings.

April 7, 2025 at 1:54 PM

Sheridan Feucht

@sfeucht.bsky.social

Japonaise Bakery in Brookline :) 🥐

A box with anpan, melonpan, a strawberry croissant, and a matcha adzuki cream puff.

November 24, 2024 at 8:57 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news