Lightnews — Scholar-powered news

Raphaël Millière

@raphaelmilliere.com

6.7K followers 950 following 130 posts

Philosopher of Artificial Intelligence & Cognitive Science
https://raphaelmilliere.com/

Posts Replies Media Videos

Raphaël Millière

@raphaelmilliere.com

How is that possible? The residual stream acts as a kind of addressable memory. We find that the model learns to dedicate separate subspaces of the residual stream to encode variables names and numerical constants. Causal interventions confirm their functional role.

10/13

June 3, 2025 at 1:19 PM

Raphaël Millière

@raphaelmilliere.com

Patching the residual stream (the main information pathway between layers) shows that information about the correct value is dynamically routed across layers at token positions corresponding to each step of the query variable's assignment chain.

8/13

June 3, 2025 at 1:19 PM

Raphaël Millière

@raphaelmilliere.com

How does the general mechanism learned in the final phase actually work? To find out, we used a causal intervention method called activation patching with counterfactual inputs to trace information flow across layers and identify causally responsible components.

7/13

June 3, 2025 at 1:19 PM

Raphaël Millière

@raphaelmilliere.com

In phase 1️⃣, the model only learns to predict random numbers. In phase 2️⃣, it learns to predict values from the first few lines of programs, which works surprisingly well for longer chains, but fails otherwise. In phase 3️⃣, it learns a systematic mechanism that generalizes.

6/13

June 3, 2025 at 1:19 PM

Raphaël Millière

@raphaelmilliere.com

We observe three distinct phases in the model's learning trajectory, with sharp phase transitions characteristic of a "grokking" dynamic:

1️⃣ Random numerical prediction (≈12% test set accuracy)
2️⃣ Shallow heuristics (≈56%)
3️⃣ General solution that solves the task (>99.9%)

5/13

June 3, 2025 at 1:19 PM

Raphaël Millière

@raphaelmilliere.com

We trained a Transformer from scratch on a variable dereferencing task. Given symbolic programs containing chains of assignments (a=5, b=a, etc) plus irrelevant distractors, the model must trace the correct chain (up to 4 assignments deep) to find a queried variable's value.

4/13

June 3, 2025 at 1:19 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news