Lightnews — Scholar-powered news

patqdasilva.bsky.social

@patqdasilva.bsky.social

✨Localization hypothesis does not always hold for FVs✨

Simple word-pair ICL translation from English into another language require more heads to be effective

Post-trained models are more steerable when more heads are used in the FV
(8/10)

April 8, 2025 at 11:34 AM

patqdasilva.bsky.social

@patqdasilva.bsky.social

✨Localization hypothesis does not always hold for FVs✨

FVs rely on the assumption that information required for ICL is stored and activated within a small subset of heads 🎯

But certain models require many heads in their FV before recovering performance 🙉🙉🙉➡️📈
(7/10)

April 8, 2025 at 11:34 AM

patqdasilva.bsky.social

@patqdasilva.bsky.social

🔔Neither Function Vectors nor Task Vectors are generalizable🔔

Several model families, even after significant hyperparameter tuning, show no improvement or even decline in relevant steering metrics🧰📉
(6/10)

April 8, 2025 at 11:34 AM

patqdasilva.bsky.social

@patqdasilva.bsky.social

🔔Neither Function Vectors nor Task Vectors are generalizable🔔

Even with the best-performing tool, FVs with full hyperparameter search, only 76% of model-task combinations recover 50% of 5-shot performance
(5/10)

April 8, 2025 at 11:34 AM

patqdasilva.bsky.social

@patqdasilva.bsky.social

DoLa’s poor efficacy could stem from the flawed assumption that factual
knowledge of an LM evolves gradually across layers

Correct/incorrect answer tokens have low probabilities before spiking at the same layer, suggesting contrasts with early layers are uninformative
(4/10)

April 8, 2025 at 11:34 AM

patqdasilva.bsky.social

@patqdasilva.bsky.social

DoLa contrasts token probabilities across layers to enhance factuality

✅Consistent with prior work, we find that DoLa works decently for Llama 1 TruthfulQA and FACTOR

❌However, for all other models tested, the improvements afforded by DoLa in most metrics is negligible
(3/10)

April 8, 2025 at 11:34 AM

patqdasilva.bsky.social

@patqdasilva.bsky.social

Steering language models by directly intervening on internal activations is appealing–but does it generalize?

We study 3 popular steering methods with 36 models from 14 families (1.5-70B), exposing brittle performance and fundamental flaws in underlying assumptions
🧵👇
(1/10)

April 8, 2025 at 11:34 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news