Lightnews — Scholar-powered news

Antonin Raffin

@araffin.bsky.social

3.3K followers 240 following 95 posts

Researcher in robotics and machine learning (Reinforcement Learning). Maintainer of Stable-Baselines (SB3).

https://araffin.github.io/

Posts Replies Media Videos

Antonin Raffin

@araffin.bsky.social

A wonderful collection of spurious correlations, correlation is not causation.

link: www.tylervigen.com/spurious-cor...

found via @stefanjudis.com newsletter

October 21, 2025 at 5:44 AM

Antonin Raffin

@araffin.bsky.social

Training a small humanoid robot with reinforcement learning using another robot for reset.

by Kaizhe Hu et al. (ToddlerBot Stanford)

Project page: robot-trains-robot.github.io

a robot arm support a robot humanoid on a treadmill

September 29, 2025 at 8:48 AM

Antonin Raffin

@araffin.bsky.social

Get ready to watch an astronaut from the ISS 🛰️ control our robots tomorrow at 13:20 (Munich Time), one of them being our favorite little happy hopper Bert =)

Link: m.youtube.com/watch?v=m4Y5...

July 23, 2025 at 2:32 PM

Antonin Raffin

@araffin.bsky.social

*you need dark mode for that

(it's then written white on black)

a screenshot of the html version with prompt injection written white on black

July 5, 2025 at 8:30 PM

Antonin Raffin

@araffin.bsky.social

There are some milestones to celebrate for Stable-Baselines3 (SB3) 🎉!

- 10k+ stars on @github.com
- 9M+ downloads on @pypi.org
- 3300+ citations for the JMLR paper
- 1000 citations for the SB2 repository

I would especially like to thank the maintainers and our contributors!

July 2, 2025 at 3:53 PM

Antonin Raffin

@araffin.bsky.social

RL Is Hard (Episode #35753)

Debugging why PPO SB3 (PyTorch) was working where PPO SBX (Jax) was failing.

Just one line of code different... 🙈 (the initialization of the actor network output)

Learning curves for PPO SB3 and PPO Jax where one is learning and the other not. Showing the one-line code diff

June 23, 2025 at 11:25 AM

Antonin Raffin

@araffin.bsky.social

New in SB3 (master branch): to use n-step returns for off-policy algorithms, simply pass `n_steps=3` (it will instantiate a `NStepReplayBuffer` being the scene).

Our implementation does not use any for loops, only NumPy vectorised operations at sampling time (so the data container stays the same).

A small code snippet showing how to use the `n_steps` parameter with SB3 lib

June 16, 2025 at 10:52 AM

Antonin Raffin

@araffin.bsky.social

For a small OpenCV code to play with the latent space, you can have a look at github.com/araffin/aae-...

A GUI showing how changing variables in the latent space affects the decoder output of an auto-encoder

February 27, 2025 at 5:31 PM

Antonin Raffin

@araffin.bsky.social

Publication-ready visualization of 3D objects and point clouds in seconds, using @blender.org and BlenderProc.

hummat.github.io/bproc-pubvis/

A screenshot of output from the blenderproc tool, with associated options. Left: Mesh Middle: Point Cloud Right: Depth

December 9, 2024 at 10:11 AM

Antonin Raffin

@araffin.bsky.social

If you want to use Stable-Baselines3 (or SBX) with DeepMind Control RL environments (dm suite), this is all you need to do.

Gist: gist.github.com/araffin/534f...

Btw, I'm looking for contributors to add dm control tasks to the RL Zoo.

Code to create a dm control env and convert it to a gym env compatible with SB3

November 24, 2024 at 6:06 PM

Antonin Raffin

@araffin.bsky.social

On Friday, I'll be in Nancy at the Humanoid Soccer Robots Workshop (Humanoids Conference 2024) to talk about

"Ingredients for Learning Locomotion Directly on Real Hardware".

My colleagues from DLR will also present some robot demos.
Slides: araffin.github.io/talk/ingredi...

November 19, 2024 at 5:13 PM

Antonin Raffin

@araffin.bsky.social

Quick Python tip to improve error messages.

When you throw an exception for an invalid value, give the user some feedback on what exactly that value was.

We tried to follow this principle in our gym env checker: github.com/DLR-RM/stabl...

An example of throwing more informative errors.
Where name is a tuple, it also prints the variable instead of just complaining it is not a string: f"'name' must be a string, not of type {type(name)}: {name}" vs "'name' must be a string"

November 11, 2024 at 4:32 PM

Antonin Raffin

@araffin.bsky.social

Post your most popular 🐦 from Twitter

Types of Reinforcement Learning Paper
Original image: @xkcd.com

Types of reinforcement learning papers, using xkcd original artwork

October 24, 2024 at 11:16 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news