Kayo Yin
banner
kayoyin.bsky.social
Kayo Yin
@kayoyin.bsky.social
PhD student at UC Berkeley. NLP for signed languages and LLM interpretability. kayoyin.github.io
🏂🎹🚵‍♀️🥋
We speculate that induction heads help models learn the more complex FV mechanism, which ultimately drives in-context learning 🤔

Paper: arxiv.org/abs/2502.14010
Which Attention Heads Matter for In-Context Learning?
Large language models (LLMs) exhibit impressive in-context learning (ICL) capability, enabling them to perform new tasks using only a few demonstrations in the prompt. Two different mechanisms have be...
arxiv.org
February 28, 2025 at 4:16 PM
How to reconcile this with previous studies on ICL?

Key difference is that previous works:
- measure ICL using differences between token losses, which we find behaves differently to few-shot ICL accuracy
- don't control for overlap between induction and FV
- focus on small models
February 28, 2025 at 4:16 PM
Other interesting findings:

- FV heads have relatively high induction scores and vice versa compared to other heads
- FV heads emerge later in training than induction heads
- ICL accuracy rises around the same time induction emerges during training, but increases more gradually
February 28, 2025 at 4:16 PM
We also find evidence of induction heads that evolve into FV heads.

Several instances of FV heads have a high induction score earlier in training (around when induction heads first emerge). However, the reverse (induction heads with high FV scores earlier) does not occur.
February 28, 2025 at 4:16 PM
2 mechanisms have been proposed to explain ICL: induction heads that find and copy relevant tokens, and FV heads that compute a latent encoding of the task from examples.

Our ablations show that FV heads are crucial for few-shot ICL, whereas induction heads are not necessary.
February 28, 2025 at 4:16 PM
Thanks for the kind words, Seth 😊 glad you joined the dinner!
December 17, 2024 at 1:56 AM
sad I’m not in town for this, looks super exciting!! 🍿
December 4, 2024 at 10:07 PM
oof yeah I was afraid something like that was maybe going on. I hope she gets the help she needs…
December 4, 2024 at 9:26 PM
ahh yes this is it thank you!! I hallucinated the end haha
November 26, 2024 at 11:20 PM
glad to know at least I didn’t just make this up 😭 I think I heard it recently too but can’t remember at alll
November 26, 2024 at 12:05 PM
Overall, handshapes in native ASL signs reflect communicative efficiency, but *not in signs borrowed from English*!

Check out our paper+code (w/ Terry Regier & Dan Klein) for more details and why we think that's the case: aclanthology.org/2024.acl-lon...

See you at TISLR in Ethiopia! ☀️ 8/8
November 21, 2024 at 5:40 AM
What about perceptual effort - could it be correlated with English usage?

Perceptual effort to distinguish between 2 handshapes is very weakly correlated with how often the 2 letters appear in similar contexts in English, and in the "wrong" direction for efficiency. 7/8
November 21, 2024 at 5:40 AM