arxiv.org/abs/2509.01440
arxiv.org/abs/2509.01440
We propose a simple classifier-based selection, enabling multilingual LLMs 🧵
We propose a simple classifier-based selection, enabling multilingual LLMs 🧵
Recycle gradients for faster neural net training with AdEMAmix iclr.cc/virtual/2025... (Fri Apr 25, 10 am).
1/3
Recycle gradients for faster neural net training with AdEMAmix iclr.cc/virtual/2025... (Fri Apr 25, 10 am).
1/3
My research interests include optimization, federated learning, machine learning, privacy, and unlearning.
My research interests include optimization, federated learning, machine learning, privacy, and unlearning.
Send your idea by end of March 🏃♂️➡️ , and run on one of the largest public AI clusters globally. Everyone is eligible to apply!
swiss-ai.org
Send your idea by end of March 🏃♂️➡️ , and run on one of the largest public AI clusters globally. Everyone is eligible to apply!
swiss-ai.org
Slides are available on my website (link in thread).
🎉 New experiments with Llama and Gemma models in the updated paper!
Slides are available on my website (link in thread).
🎉 New experiments with Llama and Gemma models in the updated paper!
Together with @danielepal.bsky.social , @matpagliardini.bsky.social, M. Jaggi and @francois.fleuret.org we show that LLMs have a smaller effective depth that can be exploited to increase inference speeds on multi-GPU settings!
arxiv.org/abs/2502.02790
(1/N)
Together with @danielepal.bsky.social , @matpagliardini.bsky.social, M. Jaggi and @francois.fleuret.org we show that LLMs have a smaller effective depth that can be exploited to increase inference speeds on multi-GPU settings!
arxiv.org/abs/2502.02790
(1/N)
We spent the last year (actually a bit longer) training an LLM with recurrent depth at scale.
The model has an internal latent space in which it can adaptively spend more compute to think longer.
I think the tech report ...🐦⬛
We spent the last year (actually a bit longer) training an LLM with recurrent depth at scale.
The model has an internal latent space in which it can adaptively spend more compute to think longer.
I think the tech report ...🐦⬛
Particle filtering approach to Improved inference w/o any training!
Check out probabilistic-inference-scaling.github.io
By Aisha Puri et al📈🤖
Joint MIT-CSAIL & RedHat
Particle filtering approach to Improved inference w/o any training!
Check out probabilistic-inference-scaling.github.io
By Aisha Puri et al📈🤖
Joint MIT-CSAIL & RedHat
mistral.ai/news/mistral...
mistral.ai/news/mistral...