Matteo Pagliardini
banner
matpagliardini.bsky.social
Matteo Pagliardini
@matpagliardini.bsky.social
PhD student in ML at EPFL 🇨🇭working with Martin Jaggi & François Fleuret. Previously Apple MLR (intern). https://mpagli.github.io/
Reposted by Matteo Pagliardini
Reposted by Matteo Pagliardini
Using the 'right' data can hugely speed up LLM training, but how to find the best training data in the vast sea of a whole web crawl?

We propose a simple classifier-based selection, enabling multilingual LLMs 🧵
April 23, 2025 at 5:06 AM
Reposted by Matteo Pagliardini
#ICLR #TrainBetterLM I am at ICLR, come to our posters for improved language model training!

Recycle gradients for faster neural net training with AdEMAmix iclr.cc/virtual/2025... (Fri Apr 25, 10 am).

1/3
April 21, 2025 at 11:55 PM
Reposted by Matteo Pagliardini
I am excited to announce that I will join the University of Zurich as an assistant professor in August this year! I am looking for PhD students and postdocs starting from the fall.

My research interests include optimization, federated learning, machine learning, privacy, and unlearning.
March 6, 2025 at 2:17 AM
Reposted by Matteo Pagliardini
The Swiss AI Initiative has launched open calls for disruptive ideas - Democratizing large-scale AI for the benefit of society.

Send your idea by end of March 🏃‍♂️‍➡️ , and run on one of the largest public AI clusters globally. Everyone is eligible to apply!

swiss-ai.org
March 4, 2025 at 11:13 PM
Reposted by Matteo Pagliardini
🤗Thanks a lot @haeggee.bsky.social and @mjaggi.bsky.social for having me in the MLO group at EPFL @icepfl.bsky.social to present "Large Language Models as Markov Chains".

Slides are available on my website (link in thread).

🎉 New experiments with Llama and Gemma models in the updated paper!
February 28, 2025 at 1:03 PM
Reposted by Matteo Pagliardini
What is the true depth of an LLM?

Together with @danielepal.bsky.social , @matpagliardini.bsky.social, M. Jaggi and @francois.fleuret.org we show that LLMs have a smaller effective depth that can be exploited to increase inference speeds on multi-GPU settings!

arxiv.org/abs/2502.02790
(1/N)
February 14, 2025 at 4:17 PM
Reposted by Matteo Pagliardini
Ok, so I can finally talk about this!

We spent the last year (actually a bit longer) training an LLM with recurrent depth at scale.

The model has an internal latent space in which it can adaptively spend more compute to think longer.

I think the tech report ...🐦‍⬛
February 10, 2025 at 4:48 PM
Reposted by Matteo Pagliardini
can we scale small, open LMs to o1 level? Using classical probabilistic inference methods, YES!

Particle filtering approach to Improved inference w/o any training!
Check out probabilistic-inference-scaling.github.io

By Aisha Puri et al📈🤖
Joint MIT-CSAIL & RedHat
Probabilistic Inference Scaling
Probabilistic Inference Scaling
probabilistic-inference-scaling.github.io
February 7, 2025 at 8:05 PM
Reposted by Matteo Pagliardini
new open weights, 24B model, with comparable performance to Llama 3.3 70B 😮. congrats mistral team!
mistral.ai/news/mistral...
January 30, 2025 at 7:01 PM
Reposted by Matteo Pagliardini
1/ 📘 Could ChatGPT get an engineering degree? Spoiler, yes! In our new @pnas.org article, we explore how AI assistants like GPT-4 perform in STEM university courses — and on average they pass a staggering 91.7% of core courses. 🧵 #AI #HigherEd #STEM #LLMs #NLProc
December 4, 2024 at 2:53 PM
Reposted by Matteo Pagliardini
New blog post on flow matching: dl.heeere.com/cfm/

Contains some nice visuals too!
November 27, 2024 at 12:53 PM