Simple word-pair ICL translation from English into another language require more heads to be effective
Post-trained models are more steerable when more heads are used in the FV
(8/10)
Simple word-pair ICL translation from English into another language require more heads to be effective
Post-trained models are more steerable when more heads are used in the FV
(8/10)
FVs rely on the assumption that information required for ICL is stored and activated within a small subset of heads 🎯
But certain models require many heads in their FV before recovering performance 🙉🙉🙉➡️📈
(7/10)
FVs rely on the assumption that information required for ICL is stored and activated within a small subset of heads 🎯
But certain models require many heads in their FV before recovering performance 🙉🙉🙉➡️📈
(7/10)
Several model families, even after significant hyperparameter tuning, show no improvement or even decline in relevant steering metrics🧰📉
(6/10)
Several model families, even after significant hyperparameter tuning, show no improvement or even decline in relevant steering metrics🧰📉
(6/10)
Even with the best-performing tool, FVs with full hyperparameter search, only 76% of model-task combinations recover 50% of 5-shot performance
(5/10)
Even with the best-performing tool, FVs with full hyperparameter search, only 76% of model-task combinations recover 50% of 5-shot performance
(5/10)
knowledge of an LM evolves gradually across layers
Correct/incorrect answer tokens have low probabilities before spiking at the same layer, suggesting contrasts with early layers are uninformative
(4/10)
knowledge of an LM evolves gradually across layers
Correct/incorrect answer tokens have low probabilities before spiking at the same layer, suggesting contrasts with early layers are uninformative
(4/10)
✅Consistent with prior work, we find that DoLa works decently for Llama 1 TruthfulQA and FACTOR
❌However, for all other models tested, the improvements afforded by DoLa in most metrics is negligible
(3/10)
✅Consistent with prior work, we find that DoLa works decently for Llama 1 TruthfulQA and FACTOR
❌However, for all other models tested, the improvements afforded by DoLa in most metrics is negligible
(3/10)
We study 3 popular steering methods with 36 models from 14 families (1.5-70B), exposing brittle performance and fundamental flaws in underlying assumptions
🧵👇
(1/10)
We study 3 popular steering methods with 36 models from 14 families (1.5-70B), exposing brittle performance and fundamental flaws in underlying assumptions
🧵👇
(1/10)