Gonçalo Paulo
goncalo-paulo.bsky.social
Gonçalo Paulo
@goncalo-paulo.bsky.social
Interpretability researcher at @eleutherai.bsky.social
We just updated the ArXiv version!
*Automatically Interpreting Millions of Features in LLMs*
by @norabelrose.bsky.social et al.

An open-source pipeline for finding interpretable features in LLMs with sparse autoencoders and automated explainability methods from @eleutherai.bsky.social.

arxiv.org/abs/2410.13928
December 4, 2024 at 5:34 PM
Reposted by Gonçalo Paulo
*Automatically Interpreting Millions of Features in LLMs*
by @norabelrose.bsky.social et al.

An open-source pipeline for finding interpretable features in LLMs with sparse autoencoders and automated explainability methods from @eleutherai.bsky.social.

arxiv.org/abs/2410.13928
November 27, 2024 at 2:58 PM