Vincent Francois-Lavet
vinfl.bsky.social
Vincent Francois-Lavet
@vinfl.bsky.social
Assistant Professor in machine learning @VUAmsterdam
Abstract representations+reinforcement learning.
Reposted by Vincent Francois-Lavet
NEW: OpenAI is releasing two free open models today, ahead of the GPT-5 launch. One of the open-weight "GPT-OSS" models is small enough to run on a laptop. More from @alexeheath.com 👇 www.theverge.com/openai/71878...
OpenAI releases a free GPT model that can run right on your laptop
GPT-OSS is OpenAI’s first open-weight model in six years.
www.theverge.com
August 5, 2025 at 5:01 PM
Reposted by Vincent Francois-Lavet
According to new research by waymo, self driving cars neural nets perform better according to power scaling laws. More data and compute = better performance. waymo.com/blog/2025/06...
June 14, 2025 at 1:31 AM
Reposted by Vincent Francois-Lavet
Major reasoning models trained w RL so far with technical reports:

2025-01-22 — DeepSeek R1 — arxiv.org/abs/2501.12948
2025-01-22 — Kimi 1.5 — arxiv.org/abs/2501.12599
2025-03-31 — Open-Reasoner-Zero — arxiv.org/abs/2503.24290
2025-04-10 — Seed 1.5-Thinking — arxiv.org/abs/2504.13914
...
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
We introduce our first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1. DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT)...
arxiv.org
June 11, 2025 at 3:17 AM
State of AI in 4 plots.

The 200 ELO points difference between recent models and a model that is 2 years old means that a human rater has ~75% chance of preferring an answer from a recent model.

Based on available data, all indicators about the progress of AI (in particular LLMs) remain strong.
June 4, 2025 at 2:17 PM
Not long ago, people laughed at the idea of AI generating minutes-long realistic videos. Now it's reality with tools like Sora and Veo 3 leading the way. Full movies in cinemas soon, generated from just a few prompts...
May 25, 2025 at 2:21 PM
📢New paper on arXiv: Hadamax Encoding: Elevating Performance in Model-Free Atari. (arxiv.org/abs/2505.15345)

Our Hadamax (Hadamard max-pooling) encoder architecture improves the recent PQN algorithm’s Atari performance by 80%, allowing it to significantly surpass Rainbow-DQN!
May 22, 2025 at 11:33 AM
Reposted by Vincent Francois-Lavet
Just shared a new article on "The State of Reinforcement Learning for LLM Reasoning"!
If you are new to reinforcement learning, this article has a generous intro section (PPO, GRPO, etc)
Also, I cover 15 recent articles focused on RL & Reasoning.

🔗 magazine.sebastianraschka.com/p/the-state-...
April 19, 2025 at 1:48 PM
Reposted by Vincent Francois-Lavet
This is indeed a great position paper, I like it a lot:
- pre-training w next token prediction creates local minima in reasoning we can't escape => pre-training should also be done with RL
- long context windows lead to exploitation of spurious correleations
- disentangle reasoning and knowledge
April 17, 2025 at 7:04 AM
Reposted by Vincent Francois-Lavet
The funny thing about multimodal image generation as released in the last week by Google and OpenAI is that now LLM image generation works like how most people using LLMs for the past two years always thought LLM image generation works.
March 26, 2025 at 1:17 AM
Reposted by Vincent Francois-Lavet
www.youtube.com/watch?v=9_Pe... An interview with Rich. The humility of Rich is truly inspiring: "There are no authorities in science". I wish people would listen and live by this.
TURING AWARD WINNER Richard S. Sutton in Conversation with Cam Linke | No Authorities in Science
YouTube video by Amii
www.youtube.com
March 6, 2025 at 8:50 PM
Reposted by Vincent Francois-Lavet
check this out: new postdoc program for AI-related research in Catalunya!

our group is looking to hire within this program, ideally to work on topics related to RL theory. in case you're interested, pls DM or email me.

(retweets appreciated!)

ramonllull-aira.eu/application
Ramon Llull AIRA Open Calls
Open Calls In our inaugural call scheduled for December 2024, we aim to select up to 17 exceptional postdoctoral fellows, with an additional 16 to be chosen in Call 2 in 2025. 20  December 2024 ...
ramonllull-aira.eu
January 22, 2025 at 4:55 PM
Reposted by Vincent Francois-Lavet
How DeepSeek R1's Multi-round Conversation works.

api-docs.deepseek.com/guides/reaso...
January 20, 2025 at 5:04 PM
Reposted by Vincent Francois-Lavet
Bombshell from DeepSeek: the R1 family of models. Incredibly, it's MIT licensed and they encourage us to distill from it.

The core of the approach is reinforcement learning from verifiable rewards. No PRMs / MCTS. R1-zero doesn't even use SFT to start.
January 20, 2025 at 3:35 PM
Reposted by Vincent Francois-Lavet
I probably don’t need to tell you that 2024 was a huge year for robotics. As a long-time robotics researcher, it’s been amazing to watch; some of the things that I always dreamed about actually seem to be happening.

For me, there are three big stories: itcanthink.substack.com/p/2024-robot...
2024 Robotics Year in Review
Robotics finally feels like it's happening
itcanthink.substack.com
January 2, 2025 at 6:15 PM
Reposted by Vincent Francois-Lavet
Super happy to reveal our new paper! 🎉🙌♟️

We trained a model to play four games, and the performance in each increases by "external search" (MCTS using a learned world model) and "internal search" where the model outputs the whole plan on its own!
December 5, 2024 at 9:09 AM
Reposted by Vincent Francois-Lavet
RLDM will be held next year in Dublin!

A reminder that the call for workshops is out: rldm.org/call-for-wor...

The workshops are one of my favourite parts of the conference :) please get in touch if you have any questions!
November 22, 2024 at 9:57 AM
Hello, world! You seem a bit wilder than I expected, but here we are.
November 18, 2024 at 7:15 PM