Raphaël Millière
banner
raphaelmilliere.com
Raphaël Millière
@raphaelmilliere.com
Philosopher of Artificial Intelligence & Cognitive Science
https://raphaelmilliere.com/
How is that possible? The residual stream acts as a kind of addressable memory. We find that the model learns to dedicate separate subspaces of the residual stream to encode variables names and numerical constants. Causal interventions confirm their functional role.

10/13
June 3, 2025 at 1:19 PM
Patching the residual stream (the main information pathway between layers) shows that information about the correct value is dynamically routed across layers at token positions corresponding to each step of the query variable's assignment chain.

8/13
June 3, 2025 at 1:19 PM
How does the general mechanism learned in the final phase actually work? To find out, we used a causal intervention method called activation patching with counterfactual inputs to trace information flow across layers and identify causally responsible components.

7/13
June 3, 2025 at 1:19 PM
In phase 1️⃣, the model only learns to predict random numbers. In phase 2️⃣, it learns to predict values from the first few lines of programs, which works surprisingly well for longer chains, but fails otherwise. In phase 3️⃣, it learns a systematic mechanism that generalizes.

6/13
June 3, 2025 at 1:19 PM
We observe three distinct phases in the model's learning trajectory, with sharp phase transitions characteristic of a "grokking" dynamic:

1️⃣ Random numerical prediction (≈12% test set accuracy)
2️⃣ Shallow heuristics (≈56%)
3️⃣ General solution that solves the task (>99.9%)

5/13
June 3, 2025 at 1:19 PM
We trained a Transformer from scratch on a variable dereferencing task. Given symbolic programs containing chains of assignments (a=5, b=a, etc) plus irrelevant distractors, the model must trace the correct chain (up to 4 assignments deep) to find a queried variable's value.

4/13
June 3, 2025 at 1:19 PM