Deniz Bayazit
bayazitdeniz.bsky.social
Deniz Bayazit
@bayazitdeniz.bsky.social
#NLProc PhD student @EPFL

#interpretability
5/ Looking closer, feature sharing has limits: in Hindi & Arabic, overlap stays low even at 341B tokens. This may be due to richer agreement systems (e.g., verbs agreeing w/ subjects & objects) forcing BLOOM to keep language-specific features—or simply data scarcity!
September 25, 2025 at 2:02 PM
4/ In #multilingual models, cross-language feature overlap starts low and rises with training. At 6B tokens in BLOOM, most detectors are language-specific or for punctuation; by 341B tokens shared crosslingual features emerge, capturing syntactic abstractions over token patterns.
September 25, 2025 at 2:02 PM
3/ Which features matter early but fade, and which gain importance later? In Pythia, token-level detectors drop out, while higher-level grammatical features—like plural-noun detectors and nouns formed from verbs (e.g., runner from run)—strengthen by 286B tokens.
September 25, 2025 at 2:02 PM
1/🚨 New preprint

How do #LLMs’ inner features change as they train? Using #crosscoders + a new causal metric, we map when features appear, strengthen, or fade across checkpoints—opening a new lens on training dynamics beyond loss curves & benchmarks.

#interpretability
September 25, 2025 at 2:02 PM