🏂🎹🚵♀️🥋
Paper: arxiv.org/abs/2502.14010
Paper: arxiv.org/abs/2502.14010
Key difference is that previous works:
- measure ICL using differences between token losses, which we find behaves differently to few-shot ICL accuracy
- don't control for overlap between induction and FV
- focus on small models
Key difference is that previous works:
- measure ICL using differences between token losses, which we find behaves differently to few-shot ICL accuracy
- don't control for overlap between induction and FV
- focus on small models
- FV heads have relatively high induction scores and vice versa compared to other heads
- FV heads emerge later in training than induction heads
- ICL accuracy rises around the same time induction emerges during training, but increases more gradually
- FV heads have relatively high induction scores and vice versa compared to other heads
- FV heads emerge later in training than induction heads
- ICL accuracy rises around the same time induction emerges during training, but increases more gradually
Several instances of FV heads have a high induction score earlier in training (around when induction heads first emerge). However, the reverse (induction heads with high FV scores earlier) does not occur.
Several instances of FV heads have a high induction score earlier in training (around when induction heads first emerge). However, the reverse (induction heads with high FV scores earlier) does not occur.
Our ablations show that FV heads are crucial for few-shot ICL, whereas induction heads are not necessary.
Our ablations show that FV heads are crucial for few-shot ICL, whereas induction heads are not necessary.
Check out our paper+code (w/ Terry Regier & Dan Klein) for more details and why we think that's the case: aclanthology.org/2024.acl-lon...
See you at TISLR in Ethiopia! ☀️ 8/8
Check out our paper+code (w/ Terry Regier & Dan Klein) for more details and why we think that's the case: aclanthology.org/2024.acl-lon...
See you at TISLR in Ethiopia! ☀️ 8/8
Perceptual effort to distinguish between 2 handshapes is very weakly correlated with how often the 2 letters appear in similar contexts in English, and in the "wrong" direction for efficiency. 7/8
Perceptual effort to distinguish between 2 handshapes is very weakly correlated with how often the 2 letters appear in similar contexts in English, and in the "wrong" direction for efficiency. 7/8