Lightnews — Scholar-powered news

Quentin Gallouédec

@qgallouedec.hf.co

It started as a modest project to offer a free, open-source alternative to MuJoCo environments, and today, panda-gym is downloaded over 100k times, and cited in over 100 papers. 🦾

May 2, 2025 at 11:14 PM

Quentin Gallouédec

@qgallouedec.hf.co

just pip install trl

April 26, 2025 at 10:57 PM

Quentin Gallouédec

@qgallouedec.hf.co

How many of these 8 things did you know?

huggingface.co/blog/qgallou...

Gotchas in Tokenizer Behavior Every Developer Should Know

A Blog post by Quentin Gallouédec on Hugging Face

huggingface.co

April 20, 2025 at 6:21 PM

Quentin Gallouédec

@qgallouedec.hf.co

🚀 TRL 0.14 – Featuring GRPO! 🚀

TRL 0.14 brings *GRPO*, the RL algorithm behind 🐳 DeekSeek-R1 .

⚡ Blazing fast generation with vLLM integration.
📉 Optimized training with DeepSpeed ZeRO 1/2/3.

January 30, 2025 at 2:54 PM

Reposted by Quentin Gallouédec

Thomas Wolf

@thomwolf.bsky.social

The most impactful open-source project of today (dixit Vercel VP of AI)
=> huggingface.co/blog/open-r1

Open-R1: a fully open reproduction of DeepSeek-R1

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

January 28, 2025 at 12:17 PM

Quentin Gallouédec

@qgallouedec.hf.co

Last moments of closed-source AI 🪦 :
Hugging Face is openly reproducing the pipeline of 🐳 DeepSeek-R1. Open data, open training. open models, open collaboration.

🫵 Let's go!
github.com/huggingface/...

GitHub - huggingface/open-r1: Fully open reproduction of DeepSeek-R1

Fully open reproduction of DeepSeek-R1. Contribute to huggingface/open-r1 development by creating an account on GitHub.

github.com

January 25, 2025 at 2:36 PM

Quentin Gallouédec

@qgallouedec.hf.co

The algorithm behind DeepSeek's R1 model (aka GRPO) now lives in TRL main branch! Go and test it!

January 22, 2025 at 3:07 PM

Quentin Gallouédec

@qgallouedec.hf.co

[Stonks] TRL is a Python library for training language models.

It has seen impressive growth this year. Lots of new features, an improved codebase, and this has translated into increased usage. You can count on us to do even more in 2025.

January 6, 2025 at 5:26 PM

Quentin Gallouédec

@qgallouedec.hf.co

🎅 Santa Claus has delivered the ultimate guide to understand OOM error (link in comment)

December 24, 2024 at 11:04 AM

Quentin Gallouédec

@qgallouedec.hf.co

Top 1 Python dev today. Third time since september 🫨

December 17, 2024 at 6:32 PM

Quentin Gallouédec

@qgallouedec.hf.co

🚨 TRL 0.13 is out! 🤗

Featuring a Process-supervised Reward Models (PRM) Trainer 🏋️

PRMs empower LLMs to "think before answering"—a key feature behind OpenAI's o1 launch just two weeks ago. 🚀

December 17, 2024 at 4:07 PM

Reposted by Quentin Gallouédec

Lewis Tunstall

@lewtun.bsky.social

We outperform Llama 70B with Llama 3B on hard math by scaling test-time compute 🔥

How? By combining step-wise reward models with tree search algorithms :)

We're open sourcing the full recipe and sharing a detailed blog post 👇

December 16, 2024 at 5:08 PM

Quentin Gallouédec

@qgallouedec.hf.co

The number of TRL models on the 🤗 Hub has risen x60 this year! 📈
How about doing the same next year?

December 3, 2024 at 12:55 PM

Reposted by Quentin Gallouédec

Ben Burtenshaw

@benburtenshaw.bsky.social

We took those TRL notebooks from last week and made a page from them. So if you're upskilling on finetuning or aligning LLMs, and want examples from the community (like Maxime Labonne Philipp Schmid Sergio Paniego Blanco), check it out!

bsky.app/profile/benb...

>> huggingface.co/docs/trl/mai...

December 2, 2024 at 9:18 AM

Quentin Gallouédec

@qgallouedec.hf.co

Join us at Hugging Face as an intern if you want to contribute to amazing open-source projects, and develop LLM's best finetuning library, aka TRL.

🧑‍💻 Full remote
🤯 Exciting subjects
🌍 Anywhere in the world
🤸🏻 Flexible working hours

Link to apply in comment 👇

November 27, 2024 at 3:49 PM

Reposted by Quentin Gallouédec

Elie

@eliebak.hf.co

We’re looking for an intern to join our SmolLM team! If you’re excited about training LLMs and building high-quality datasets, we’d love to hear from you. 🤗

US: apply.workable.com/huggingface/...
EMEA: apply.workable.com/huggingface/...

ML Research Engineer Internship, SmolLMs pretraining and datasets - EMEA Remote - Hugging Face

Here at Hugging Face, we’re on a journey to advance good Machine Learning and make it more accessible. Along the way, we contribute to the development of technology for the better.We have built the fa...

apply.workable.com

November 27, 2024 at 10:20 AM

Quentin Gallouédec

@qgallouedec.hf.co

I'd love to! We have a lot of room for improvement here!

Ben Burtenshaw @benburtenshaw.bsky.social · Nov 25

These tutorials provide a comprehensive but concise roadmap through TRL across the main fine-tuning and alignment classes.

🤔 Let me know if you would like a dedicated course on TRL basics.

November 25, 2024 at 10:43 AM

Reposted by Quentin Gallouédec

Ben Burtenshaw

@benburtenshaw.bsky.social

These tutorials provide a comprehensive but concise roadmap through TRL across the main fine-tuning and alignment classes.

🤔 Let me know if you would like a dedicated course on TRL basics.

November 25, 2024 at 10:16 AM

Reposted by Quentin Gallouédec

Thomas Wolf

@thomwolf.bsky.social

It's Sunday morning so taking a minute for a nerdy thread (on math, tokenizers and LLMs) of the work of our intern Garreth

By adding a few lines of code to the base Llama 3 tokenizer, he got a free boost in arithmetic performance 😮

[thread]

November 24, 2024 at 11:05 AM

Quentin Gallouédec

@qgallouedec.hf.co

How can you avoid the temptation to use a subprocess for sub-commands?

This blog post from @muellerzr.bsky.social saved my day.

muellerzr.github.io/til/argparse...

Zach Mueller - Calling argparse without subprocess

How to use argparse without the CLI

muellerzr.github.io

November 22, 2024 at 7:02 PM

Quentin Gallouédec

@qgallouedec.hf.co

Finetune SmolLM2 with TRL!

Ben Burtenshaw @benburtenshaw.bsky.social · Nov 21

Here's a notebook where I do SFT SmolLM2 on the synthetic dataset: colab.research.google.com/drive/1lioed...

thanks @philschmid.bsky.social for the finetuning code
thanks @huggingface.bsky.social for the smol model
thanks @qgallouedec.bsky.social and friends for TRL

Google Colab

colab.research.google.com

November 21, 2024 at 11:32 AM

Reposted by Quentin Gallouédec

jsulz

@jsulz.com

When XetHub joined Hugging Face, we brainstormed how to share our tech with the community.

The magic? Versioning chunks, not files, giving rise to:

🧠 Smarter storage
⏩ Faster uploads
🚀 Efficient downloads

Curious? Read the blog and let us know how it could help your workflows!

From Files to Chunks: Improving HF Storage Efficiency

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

November 20, 2024 at 6:51 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news