Ambroise Odonnat
ambroiseodt.bsky.social
Ambroise Odonnat
@ambroiseodt.bsky.social
Ph.D. student in Machine Learning at Inria.
Website: https://ambroiseodt.github.io/
Blog: https://logb-research.github.io
Congrats!
April 17, 2025 at 1:19 PM
📑Paper: arxiv.org/pdf/2410.02724
📈Slides: drive.google.com/file/d/1JDrV... (better with Adobe Reader for nice GIFs)
🌐Website: ambroiseodt.github.io
arxiv.org
February 28, 2025 at 1:03 PM
Finally, I can't thank you enough Wes and @viviencabannes.bsky.social for this collab: you are a rare combination of super-smart and fun to work with!

Hopefully, more to come soon🤠

"Moi, si je devais résumer ma vie aujourd’hui avec vous, je dirais que c’est d’abord des rencontres."
February 4, 2025 at 11:56 AM
We want to thank Elvis Dohmatob, Eshaan Nichani, @giupaolo.bsky.social , Faniriana Rakoto Endor, and Ievgen Redko for fruitful discussions during the elaboration of this work 😇
February 4, 2025 at 11:56 AM
From the theoretical side, we show that clustering heads can be learned via gradient descent and provide theoretical insights into the two-stage learning observed in practice.
6/🧵
February 4, 2025 at 11:56 AM
We investigate loss spikes, suggesting potential strategies for mitigation, which could lead to more stable training processes. We also peek into the transferability of circuits to showcase the usefulness of curriculum learning and data curation.
5/🧵
February 4, 2025 at 11:56 AM
In the second, we unveil "𝑪𝒍𝒖𝒔𝒕𝒆𝒓𝒊𝒏𝒈 𝑯𝒆𝒂𝒅𝒔", circuits that learn the invariance of the task. Their training dynamic is in two phases: 1) clustering of the attention embeddings according to invariance and 2) classifier fitting.
4/🧵
February 4, 2025 at 11:56 AM
In the first paper, we show how GD (gradient descent) reinforces useful circuits in transformers while pruning others to create sub-circuits that help solve complex tasks by breaking them down into intermediate reasoning steps.

3/🧵
February 4, 2025 at 11:56 AM
We consider the 𝒔𝒑𝒂𝒓𝒔𝒆 𝒎𝒐𝒅𝒖𝒍𝒂𝒓 𝒂𝒅𝒅𝒊𝒕𝒊𝒐𝒏 problem where the inputs are sequences of L tokens in the ring of integers modulo p and the corresponding targets are the sum of the first k terms modulo p. Formally, we aim to learn the mapping:

2/🧵
February 4, 2025 at 11:56 AM
Hi @vickiboykis.com, thanks for your interest. Don’t hesitate if you have any questions on the paper, we would be happy to help with @ozekri.bsky.social :)
December 4, 2024 at 10:23 AM
Ahah, thanks, still a lot to learn before that 😅
December 3, 2024 at 9:35 PM
🤗This is joint work with Renchunzi Xie, Vasilii Feofanov, Weijian Deng, Jianfeng Zhang, and Bo An.

Finally, I want to thank @ramealexandre.bsky.social Youssef Attia El Hili for fruitful discussions during the elaboration of this work.

🧵/🧵
December 3, 2024 at 4:58 PM
🥳Finally the awaited surprise!
Our work includes a result akin to the one of
@petar-v.bsky.social in “softmax is not sharp enough” (arxiv.org/pdf/2410.01104). We discuss its implications in the context of unsupervised accuracy estimation.

12/🧵
December 3, 2024 at 4:58 PM