Lightnews — Scholar-powered news

Ambroise Odonnat

@ambroiseodt.bsky.social

Ph.D. student in Machine Learning at Inria.
Website: https://ambroiseodt.github.io/
Blog: https://logb-research.github.io

Posts Replies Media Videos

Ambroise Odonnat

@ambroiseodt.bsky.social

Congrats!

April 17, 2025 at 1:19 PM

Ambroise Odonnat

@ambroiseodt.bsky.social

📑Paper: arxiv.org/pdf/2410.02724
📈Slides: drive.google.com/file/d/1JDrV... (better with Adobe Reader for nice GIFs)
🌐Website: ambroiseodt.github.io

arxiv.org

February 28, 2025 at 1:03 PM

Ambroise Odonnat

@ambroiseodt.bsky.social

Finally, I can't thank you enough Wes and @viviencabannes.bsky.social for this collab: you are a rare combination of super-smart and fun to work with!

Hopefully, more to come soon🤠

"Moi, si je devais résumer ma vie aujourd’hui avec vous, je dirais que c’est d’abord des rencontres."

February 4, 2025 at 11:56 AM

Ambroise Odonnat

@ambroiseodt.bsky.social

We want to thank Elvis Dohmatob, Eshaan Nichani, @giupaolo.bsky.social , Faniriana Rakoto Endor, and Ievgen Redko for fruitful discussions during the elaboration of this work 😇

February 4, 2025 at 11:56 AM

Ambroise Odonnat

@ambroiseodt.bsky.social

From the theoretical side, we show that clustering heads can be learned via gradient descent and provide theoretical insights into the two-stage learning observed in practice.
6/🧵

February 4, 2025 at 11:56 AM

Ambroise Odonnat

@ambroiseodt.bsky.social

We investigate loss spikes, suggesting potential strategies for mitigation, which could lead to more stable training processes. We also peek into the transferability of circuits to showcase the usefulness of curriculum learning and data curation.
5/🧵

February 4, 2025 at 11:56 AM

Ambroise Odonnat

@ambroiseodt.bsky.social

In the second, we unveil "𝑪𝒍𝒖𝒔𝒕𝒆𝒓𝒊𝒏𝒈 𝑯𝒆𝒂𝒅𝒔", circuits that learn the invariance of the task. Their training dynamic is in two phases: 1) clustering of the attention embeddings according to invariance and 2) classifier fitting.
4/🧵

February 4, 2025 at 11:56 AM

Ambroise Odonnat

@ambroiseodt.bsky.social

In the first paper, we show how GD (gradient descent) reinforces useful circuits in transformers while pruning others to create sub-circuits that help solve complex tasks by breaking them down into intermediate reasoning steps.

3/🧵

February 4, 2025 at 11:56 AM

Ambroise Odonnat

@ambroiseodt.bsky.social

We consider the 𝒔𝒑𝒂𝒓𝒔𝒆 𝒎𝒐𝒅𝒖𝒍𝒂𝒓 𝒂𝒅𝒅𝒊𝒕𝒊𝒐𝒏 problem where the inputs are sequences of L tokens in the ring of integers modulo p and the corresponding targets are the sum of the first k terms modulo p. Formally, we aim to learn the mapping:

2/🧵

February 4, 2025 at 11:56 AM

Ambroise Odonnat

@ambroiseodt.bsky.social

Hi @vickiboykis.com, thanks for your interest. Don’t hesitate if you have any questions on the paper, we would be happy to help with @ozekri.bsky.social :)

December 4, 2024 at 10:23 AM

Ambroise Odonnat

@ambroiseodt.bsky.social

Ahah, thanks, still a lot to learn before that 😅

December 3, 2024 at 9:35 PM

Ambroise Odonnat

@ambroiseodt.bsky.social

🤗This is joint work with Renchunzi Xie, Vasilii Feofanov, Weijian Deng, Jianfeng Zhang, and Bo An.

Finally, I want to thank @ramealexandre.bsky.social Youssef Attia El Hili for fruitful discussions during the elaboration of this work.

🧵/🧵

December 3, 2024 at 4:58 PM

Ambroise Odonnat

@ambroiseodt.bsky.social

🥳Finally the awaited surprise!
Our work includes a result akin to the one of
@petar-v.bsky.social in “softmax is not sharp enough” (arxiv.org/pdf/2410.01104). We discuss its implications in the context of unsupervised accuracy estimation.

12/🧵

December 3, 2024 at 4:58 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news