Deniz Bayazit
bayazitdeniz.bsky.social
Deniz Bayazit
@bayazitdeniz.bsky.social
#NLProc PhD student @EPFL

#interpretability
6/ Concurrently, recent work shows broad phases of concept evolution (statistical→feature learning) with sparse crosscoders; we track causal dynamics of specific concepts over time and across languages with RelIE, giving a fuller and deeper view.

arxiv.org/abs/2509.17196
Evolution of Concepts in Language Model Pre-Training
Language models obtain extensive capabilities through pre-training. However, the pre-training process remains a black box. In this work, we track linear interpretable feature evolution across pre-trai...
arxiv.org
September 25, 2025 at 2:02 PM
5/ Looking closer, feature sharing has limits: in Hindi & Arabic, overlap stays low even at 341B tokens. This may be due to richer agreement systems (e.g., verbs agreeing w/ subjects & objects) forcing BLOOM to keep language-specific features—or simply data scarcity!
September 25, 2025 at 2:02 PM
4/ In #multilingual models, cross-language feature overlap starts low and rises with training. At 6B tokens in BLOOM, most detectors are language-specific or for punctuation; by 341B tokens shared crosslingual features emerge, capturing syntactic abstractions over token patterns.
September 25, 2025 at 2:02 PM
3/ Which features matter early but fade, and which gain importance later? In Pythia, token-level detectors drop out, while higher-level grammatical features—like plural-noun detectors and nouns formed from verbs (e.g., runner from run)—strengthen by 286B tokens.
September 25, 2025 at 2:02 PM
2/ We align critical checkpoints for a task with sparse crosscoders, measure each feature’s causal role, and introduce RelIE to compare their influence across checkpoints. This lets us trace how internal features shift—and when they matter—in models like Pythia, OLMo, and BLOOM.
September 25, 2025 at 2:02 PM