Lightnews — Scholar-powered news

Matteo Pagliardini

@matpagliardini.bsky.social

PhD student in ML at EPFL 🇨🇭working with Martin Jaggi & François Fleuret. Previously Apple MLR (intern). https://mpagli.github.io/

Posts Replies Media Videos

Reposted by Matteo Pagliardini

Martin Jaggi

@mjaggi.bsky.social

new extensive evaluation of different optimizers for LLM training
arxiv.org/abs/2509.01440

Benchmarking Optimizers for Large Language Model Pretraining

The recent development of Large Language Models (LLMs) has been accompanied by an effervescence of novel ideas and methods to better optimize the loss of deep learning models. Claims from those method...

arxiv.org

September 3, 2025 at 10:44 AM

Reposted by Matteo Pagliardini

Martin Jaggi

@mjaggi.bsky.social

Using the 'right' data can hugely speed up LLM training, but how to find the best training data in the vast sea of a whole web crawl?

We propose a simple classifier-based selection, enabling multilingual LLMs 🧵

Enhancing Multilingual LLM Pretraining with Model-Based Data Selection

April 23, 2025 at 5:06 AM

Reposted by Matteo Pagliardini

davidgrangier.bsky.social

@davidgrangier.bsky.social

#ICLR #TrainBetterLM I am at ICLR, come to our posters for improved language model training!

Recycle gradients for faster neural net training with AdEMAmix iclr.cc/virtual/2025... (Fri Apr 25, 10 am).

1/3

April 21, 2025 at 11:55 PM

Reposted by Matteo Pagliardini

Anastasia Koloskova

@koloskova.bsky.social

I am excited to announce that I will join the University of Zurich as an assistant professor in August this year! I am looking for PhD students and postdocs starting from the fall.

My research interests include optimization, federated learning, machine learning, privacy, and unlearning.

March 6, 2025 at 2:17 AM

Reposted by Matteo Pagliardini

Martin Jaggi

@mjaggi.bsky.social

The Swiss AI Initiative has launched open calls for disruptive ideas - Democratizing large-scale AI for the benefit of society.

Send your idea by end of March 🏃‍♂️‍➡️ , and run on one of the largest public AI clusters globally. Everyone is eligible to apply!

swiss-ai.org

March 4, 2025 at 11:13 PM

Reposted by Matteo Pagliardini

Ambroise Odonnat

@ambroiseodt.bsky.social

🤗Thanks a lot @haeggee.bsky.social and @mjaggi.bsky.social for having me in the MLO group at EPFL @icepfl.bsky.social to present "Large Language Models as Markov Chains".

Slides are available on my website (link in thread).

🎉 New experiments with Llama and Gemma models in the updated paper!

February 28, 2025 at 1:03 PM

Reposted by Matteo Pagliardini

Ramon

@noctrog.bsky.social

What is the true depth of an LLM?

Together with @danielepal.bsky.social , @matpagliardini.bsky.social, M. Jaggi and @francois.fleuret.org we show that LLMs have a smaller effective depth that can be exploited to increase inference speeds on multi-GPU settings!

arxiv.org/abs/2502.02790
(1/N)

February 14, 2025 at 4:17 PM

Reposted by Matteo Pagliardini

Jonas

@jonasgeiping.bsky.social

Ok, so I can finally talk about this!

We spent the last year (actually a bit longer) training an LLM with recurrent depth at scale.

The model has an internal latent space in which it can adaptively spend more compute to think longer.

I think the tech report ...🐦‍⬛

February 10, 2025 at 4:48 PM

Reposted by Matteo Pagliardini

Leshem (Legend) Choshen @EMNLP

@lchoshen.bsky.social

can we scale small, open LMs to o1 level? Using classical probabilistic inference methods, YES!

Particle filtering approach to Improved inference w/o any training!
Check out probabilistic-inference-scaling.github.io

By Aisha Puri et al📈🤖
Joint MIT-CSAIL & RedHat

Probabilistic Inference Scaling

probabilistic-inference-scaling.github.io

February 7, 2025 at 8:05 PM

Reposted by Matteo Pagliardini

Martin Jaggi

@mjaggi.bsky.social

new open weights, 24B model, with comparable performance to Llama 3.3 70B 😮. congrats mistral team!
mistral.ai/news/mistral...

January 30, 2025 at 7:01 PM

Reposted by Matteo Pagliardini

Antoine Bosselut

@abosselut.bsky.social

1/ 📘 Could ChatGPT get an engineering degree? Spoiler, yes! In our new @pnas.org article, we explore how AI assistants like GPT-4 perform in STEM university courses — and on average they pass a staggering 91.7% of core courses. 🧵 #AI #HigherEd #STEM #LLMs #NLProc

December 4, 2024 at 2:53 PM

Reposted by Matteo Pagliardini

Kosta Derpanis

@csprofkgd.bsky.social

New blog post on flow matching: dl.heeere.com/cfm/

Contains some nice visuals too!

November 27, 2024 at 12:53 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news