Sheridan Feucht
banner
sfeucht.bsky.social
Sheridan Feucht
@sfeucht.bsky.social
PhD student doing LLM interpretability with @davidbau.bsky.social and @byron.bsky.social. (they/them) https://sfeucht.github.io
Try it out in our new paper demo notebook! Or ping me with any sequence to try and I'd be more than happy to run a few examples for you.
colab.research.google.com/github/sfeuc...

Also check out the new camera-ready version of the paper on arXiv.
arxiv.org/abs/2504.03022
Google Colab
colab.research.google.com
July 22, 2025 at 12:40 PM
If we do the same for token induction heads, we can also get a "token lens", which reads out surface-level token information from states. Unlike raw logit lens, which reveals next-token predictions, "token lens" reveals the current token.
July 22, 2025 at 12:40 PM
If we apply concept lens to the word "cardinals" in three contexts, we see that Llama-2-7b has encoded this word very differently in each case!
July 22, 2025 at 12:40 PM
To do this, we sum the OV matrices of the top-k concept induction heads, and use it to transform a hidden state at a particular token position. Projecting that to vocab space with the model's decoder head, we can access the "meaning" encoded in that state.
July 22, 2025 at 12:40 PM
I'm on the train right now and just finished reading this paper for the first time--I actually just logged back on to bsky just so that I could link to it, but you beat me to the punch!

I really enjoyed your paper. This example was particularly great.
April 25, 2025 at 8:01 PM
That’s a good point! Sort of related, I noticed last night that when I have to type in a 2FA code I usually compress the numbers. Like if the code is 51692 I think “fifty-one, sixty-nine, two.” I wonder if this is a thing that people have studied. Thanks for the comment :)
April 9, 2025 at 12:42 AM
Yin & Steinhardt (2025) recently showed that FV heads are more important for ICL than token induction heads. But for translation, *concept* induction heads matter too! They copy forward word meanings, whereas FV heads influence the output language.
bsky.app/profile/kay...
April 7, 2025 at 1:54 PM
Concept heads also output language-agnostic word representations. If we patch the outputs of these heads from one translation prompt to another, we can change the *meaning* of the outputted word, without changing the language. (see prior work from @butanium.bsky.social and @wendlerc.bsky.social)
April 7, 2025 at 1:54 PM
Token induction heads are still important, though. When we ablate them over long sequences, models start to paraphrase instead of copying. We take this to mean that token induction heads are responsible for *exact* copying (which concept induction heads apparently can't do).
April 7, 2025 at 1:54 PM
But how do we know these heads copy semantics? When we ablate concept induction heads, performance drops drastically for translation, synonyms, and antonyms: all tasks that require copying *meaning*, not just literal tokens.
April 7, 2025 at 1:54 PM
Previous work showed that token induction heads attend to the next token to be copied (*window*pane). Analogously, we find that concept induction heads attend to the end of the next multi-token word to be copied (windowp*ane*).
April 7, 2025 at 1:54 PM
--using causal interventions. Essentially, we pick out all of the attention heads that are responsible for promoting future entity tokens (e.g. "ax" in "waxwing"). We hypothesize that heads carrying an entire entity actually represent the *meaning* of that chunk of tokens.
April 7, 2025 at 1:54 PM
Induction heads were discovered by Elhage et al. (2021) and Olsson et al. (2022). They focused on token copying, but some of the heads they found also seemed to activate for "fuzzy" copying tasks, like translation. We directly identify these heads--
transformer-circuits.pub/2022/in-con...
April 7, 2025 at 1:54 PM
There are multiple ways to copy text! Copying a wifi password like hxioW2qN52 is different than copying a meaningful one like OwlDoorGlass. Nonsense copying requires each char to be transferred one-by-one, but meaningful words can be copied all at once. Turns out, LLMs do both.
April 7, 2025 at 1:54 PM
So gorgeous, is this in Cambridge?
April 1, 2025 at 11:29 PM
Looks really cool! Can’t wait to give this a proper read.
March 12, 2025 at 1:38 PM