Mor Geva
megamor2.bsky.social
Mor Geva
@megamor2.bsky.social
How can we interpret LLM features at scale? 🤔

Current pipelines use activating inputs, which is costly and ignores how features causally affect model outputs!
We propose efficient output-centric methods that better predict the steering effect of a feature.

New preprint led by @yoav.ml 🧵1/
January 28, 2025 at 7:34 PM