Important question: Do SAEs generalise?
We explore the answerability detection in LLMs by comparing SAE features vs. linear residual stream probes.
Answer:
probes outperform SAE features in-domain, out-of-domain generalization varies sharply between features and datasets. 🧵
Important question: Do SAEs generalise?
We explore the answerability detection in LLMs by comparing SAE features vs. linear residual stream probes.
Answer:
probes outperform SAE features in-domain, out-of-domain generalization varies sharply between features and datasets. 🧵