Tiago Pimentel
tpimentel.bsky.social
Tiago Pimentel
@tpimentel.bsky.social
Postdoc at ETH. Formerly, PhD student at the University of Cambridge :)
This project was done with Finlay and
@kmahowald.bsky.social, and it is the outcome of Finlay's Bachelor's thesis! Catch him presenting it in #EMNLP2025 :)

Paper: arxiv.org/abs/2509.26643
Code: github.com/Tr1ple-F/con...
Convergence and Divergence of Language Models under Different Random Seeds
In this paper, we investigate the convergence of language models (LMs) trained under different random seeds, measuring convergence as the expected per-token Kullback--Leibler (KL) divergence across se...
arxiv.org
October 1, 2025 at 6:08 PM
See our paper for more: we have analyses on other models, downstream tasks, and considering only subsets of tokens (e.g., only tokens with a certain part-of-speech)!
October 1, 2025 at 6:08 PM
This means that: (1) LMs can get less similar to each other, even while they all get closer to the true distribution; and (2) larger models reconverge faster, while small ones may never reconverge.
October 1, 2025 at 6:08 PM
* A sharp-divergence phase, where models diverge as they start using context.
* A slow-reconvergence phase, where predictions slowly become more similar again (especially in larger models).
October 1, 2025 at 6:08 PM
Surprisingly, convergence isn’t monotonic. Instead, we find four convergence phases across model training.
* A uniform phase, where all seeds output nearly-uniform distributions.
* A sharp-convergence phase, where models align, largely due to unigram frequency learning.
October 1, 2025 at 6:08 PM
In this paper, we define convergence as the similarity between outputs of LMs trained under different seeds, where similarity is measured as a per-token KL divergence. This lets us track whether models trained under identical settings, but different seeds, behave the same.
October 1, 2025 at 6:08 PM
Importantly, despite these results, we still believe causal abstraction is one of the best frameworks available for mech interpretability. Going forward, we should try to better understand how it is impacted by assumptions about how DNNs encode information. Longer🧵soon by @denissutter.bsky.social
July 14, 2025 at 12:15 PM
Overall, our results show that causal abstraction (and interventions) is not a silver bullet, as it relies on assumptions about how features are encoded in the DNNs. We then connect our results to the linear representation hypothesis and to older debates in the probing literature.
July 14, 2025 at 12:15 PM
We show—both theoretically (under reasonable assumptions) and empirically (on real-world models)—that, if we allow variables to be encoded in arbitrarily complex subspaces of the DNN’s representations, any algorithm can be mapped to any model.
July 14, 2025 at 12:15 PM
Causal abstraction identifies this correspondence by finding subspaces in the DNN's hidden states which encode the algorithm’s hidden variables. Given such a map, we say the DNN implements the algorithm if the two behave identically under interventions.
July 14, 2025 at 12:15 PM
In this new paper, w/ @denissutter.bsky.social , @jkminder.bsky.social, and T.Hofmann, we study *causal abstraction*, a formal specification of when a deep neural network (DNN) implements an algorithm. This is the framework behind, e.g., distributed alignment search.

Paper: arxiv.org/abs/2507.08802
The Non-Linear Representation Dilemma: Is Causal Abstraction Enough for Mechanistic Interpretability?
The concept of causal abstraction got recently popularised to demystify the opaque decision-making processes of machine learning models; in short, a neural network can be abstracted as a higher-level ...
arxiv.org
July 14, 2025 at 12:15 PM