Raphaël Millière
banner
raphaelmilliere.com
Raphaël Millière
@raphaelmilliere.com
Philosopher of Artificial Intelligence & Cognitive Science
https://raphaelmilliere.com/
But models also showed different sensitivities than humans. For example, top LLMs were more affected by permuting the order of examples and were more distracted by irrelevant semantic information, hinting at different underlying mechanisms. 7/9
August 11, 2025 at 8:02 AM
We found that the best-performing LLMs match human performance across many of our challenging new tasks. This provides evidence that sophisticated analogical reasoning can emerge from domain-general learning, where existing computational models fall short. 6/9
August 11, 2025 at 8:02 AM
In our second study, we highlighted the role of semantic content. Here, the task required identifying specific properties of concepts (e.g., "Is it a mammal?", "How many legs does it have?") and mapping them to features of the symbol strings. 5/9
August 11, 2025 at 8:02 AM
In our first study, we tested whether LLMs could map semantic relationships between concepts to symbolic patterns. We included controls such as permuting the order of examples or adding semantic distractors to test for robustness and content effects (see full list below). 4/9
August 11, 2025 at 8:02 AM
Can LLMs reason by analogy like humans? We investigate this question in a new paper published in the Journal of Memory and Language (link below). This was a long-running but very rewarding project. Here are a few thoughts on our methodology and main findings. 1/9
August 11, 2025 at 8:02 AM
For example, an RLM asked to generate a hateful tirade may conclude in its reasoning trace that it should refuse; but if the prompt instructs it to assess each hateful sentence within its thinking process, it will often leak the full harmful content! (see example below) 9/13
June 10, 2025 at 1:39 PM
The problem is that these norms often conflict. For example, a request for dangerous information (violating “harmlessness”) can be framed as an educational query (appealing to “helpfulness”). Many issues with LLM behavior can be framed through these normative conflicts. 3/13
June 10, 2025 at 1:39 PM
Despite extensive safety training, LLMs remain vulnerable to “jailbreaking” through adversarial prompts. Why does this vulnerability persist? In a new paper published in Philosophical Studies, I argue this is because current alignment methods are fundamentally shallow. 🧵 1/13
June 10, 2025 at 1:39 PM
So, the model learns a circuit that encodes variables & values in distinct subspaces. How does it learn? Interestingly, the circuit does *not* replace earlier heuristics – it's built on top! Heuristics are still used when they work & the circuit activates when they fail.

11/13
June 3, 2025 at 1:19 PM
How is that possible? The residual stream acts as a kind of addressable memory. We find that the model learns to dedicate separate subspaces of the residual stream to encode variables names and numerical constants. Causal interventions confirm their functional role.

10/13
June 3, 2025 at 1:19 PM
Patching individual attention heads reveals how they specialize and coordinate to route information: early heads handle the first hop in the assignment chain, mid-layer heads propagate subsequent hops, and late heads aggregate the answer at query position.

9/13
June 3, 2025 at 1:19 PM
Patching the residual stream (the main information pathway between layers) shows that information about the correct value is dynamically routed across layers at token positions corresponding to each step of the query variable's assignment chain.

8/13
June 3, 2025 at 1:19 PM
How does the general mechanism learned in the final phase actually work? To find out, we used a causal intervention method called activation patching with counterfactual inputs to trace information flow across layers and identify causally responsible components.

7/13
June 3, 2025 at 1:19 PM
In phase 1️⃣, the model only learns to predict random numbers. In phase 2️⃣, it learns to predict values from the first few lines of programs, which works surprisingly well for longer chains, but fails otherwise. In phase 3️⃣, it learns a systematic mechanism that generalizes.

6/13
June 3, 2025 at 1:19 PM
We observe three distinct phases in the model's learning trajectory, with sharp phase transitions characteristic of a "grokking" dynamic:

1️⃣ Random numerical prediction (≈12% test set accuracy)
2️⃣ Shallow heuristics (≈56%)
3️⃣ General solution that solves the task (>99.9%)

5/13
June 3, 2025 at 1:19 PM
We trained a Transformer from scratch on a variable dereferencing task. Given symbolic programs containing chains of assignments (a=5, b=a, etc) plus irrelevant distractors, the model must trace the correct chain (up to 4 assignments deep) to find a queried variable's value.

4/13
June 3, 2025 at 1:19 PM
Variable binding – the ability to associate abstract variables with values – is fundamental to computation & cognition. Classical architectures implement this through addressable memory, but neural nets like Transformers lack such explicit mechanisms. Can they learn it?

3/13
June 3, 2025 at 1:19 PM
My article 'Constitutive Self-Consciousness' is now published online in the Australasian Journal of Philosophy. It argues (spoiler alert!) against the claim that self-consciousness is constitutive of consciousness.
December 17, 2024 at 11:29 AM
I look forward to speaking at the 12th Annual Marshall M. Weinberg Symposium @UMich on Friday, alongside researchers whose work I deeply admire – Yejin Choi, @melaniemitchell.bsky.social, and Paul Smolensky.

More information including link to the livestream: lsa.umich.edu/weinberginst...
March 25, 2024 at 6:06 PM
OpenAI unveiled its video generation model Sora two weeks ago. The technical report emphatically suggests that video generation models like Sora are world simulators. Are they? What does that even mean? I'm taking a deep dive into these questions in a new blog post (link below).
February 29, 2024 at 1:52 PM
Of course image diffusion models also fail to capture some aspects of the structure of natural images. For example, they fail to capture correct projective geometry. 11/
arxiv.org/abs/2311.17138
February 17, 2024 at 3:13 AM
The Sora tech report is scant on details, but we know it's a diffusion model w/ a ViT backbone that processes frame patches as tokens. This architecture is likely expressive enough for sophisticated internal structure to emerge with scale and diverse training data. 8/
February 17, 2024 at 3:12 AM
Of course, whether humans & animals have an IPE in this robust sense is up for debate. They can understand and predict the physical properties of objects and their interactions from a young age. Whether this literally involves IPE-style stimulation is more controversial. 7/
February 17, 2024 at 3:12 AM
Of course it's widely unlikely that Sora literally makes function calls to an external physics engine like UE5 during inference. Note that this has been done before with LLMs, see this Google paper where the model answers questions through simulations with a physics engine. 2/
February 17, 2024 at 3:10 AM
If your curiosity is piqued, check out the details in the preprint! I hope it illustrates how philosophical work on AI ethics/safety can fruitfully interact with technical issues in DL. If you have comments and/or work on related issues, I'd love to hear from you! 18/18
November 7, 2023 at 7:36 PM