Lightnews — Scholar-powered news

Reposted by Vincent Francois-Lavet

Tom Warren

@tomwarren.co.uk

NEW: OpenAI is releasing two free open models today, ahead of the GPT-5 launch. One of the open-weight "GPT-OSS" models is small enough to run on a laptop. More from @alexeheath.com 👇 www.theverge.com/openai/71878...

OpenAI releases a free GPT model that can run right on your laptop

GPT-OSS is OpenAI’s first open-weight model in six years.

www.theverge.com

August 5, 2025 at 5:01 PM

Reposted by Vincent Francois-Lavet

Chris Paxton

@cpaxton.bsky.social

According to new research by waymo, self driving cars neural nets perform better according to power scaling laws. More data and compute = better performance. waymo.com/blog/2025/06...

June 14, 2025 at 1:31 AM

Reposted by Vincent Francois-Lavet

Nathan Lambert

@natolambert.bsky.social

Major reasoning models trained w RL so far with technical reports:

2025-01-22 — DeepSeek R1 — arxiv.org/abs/2501.12948
2025-01-22 — Kimi 1.5 — arxiv.org/abs/2501.12599
2025-03-31 — Open-Reasoner-Zero — arxiv.org/abs/2503.24290
2025-04-10 — Seed 1.5-Thinking — arxiv.org/abs/2504.13914
...

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

We introduce our first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1. DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT)...

arxiv.org

June 11, 2025 at 3:17 AM

Vincent Francois-Lavet

@vinfl.bsky.social

State of AI in 4 plots.

The 200 ELO points difference between recent models and a model that is 2 years old means that a human rater has ~75% chance of preferring an answer from a recent model.

Based on available data, all indicators about the progress of AI (in particular LLMs) remain strong.

June 4, 2025 at 2:17 PM

Vincent Francois-Lavet

@vinfl.bsky.social

Not long ago, people laughed at the idea of AI generating minutes-long realistic videos. Now it's reality with tools like Sora and Veo 3 leading the way. Full movies in cinemas soon, generated from just a few prompts...

May 25, 2025 at 2:21 PM

Vincent Francois-Lavet

@vinfl.bsky.social

📢New paper on arXiv: Hadamax Encoding: Elevating Performance in Model-Free Atari. (arxiv.org/abs/2505.15345)

Our Hadamax (Hadamard max-pooling) encoder architecture improves the recent PQN algorithm’s Atari performance by 80%, allowing it to significantly surpass Rainbow-DQN!

May 22, 2025 at 11:33 AM

Reposted by Vincent Francois-Lavet

Sebastian Raschka (rasbt)

@sebastianraschka.com

Just shared a new article on "The State of Reinforcement Learning for LLM Reasoning"!
If you are new to reinforcement learning, this article has a generous intro section (PPO, GRPO, etc)
Also, I cover 15 recent articles focused on RL & Reasoning.

🔗 magazine.sebastianraschka.com/p/the-state-...

April 19, 2025 at 1:48 PM

Reposted by Vincent Francois-Lavet

Christian Wolf

@chriswolfvision.bsky.social

This is indeed a great position paper, I like it a lot:
- pre-training w next token prediction creates local minima in reasoning we can't escape => pre-training should also be done with RL
- long context windows lead to exploitation of spurious correleations
- disentangle reasoning and knowledge

April 17, 2025 at 7:04 AM

Reposted by Vincent Francois-Lavet

Ethan Mollick

@emollick.bsky.social

The funny thing about multimodal image generation as released in the last week by Google and OpenAI is that now LLM image generation works like how most people using LLMs for the past two years always thought LLM image generation works.

March 26, 2025 at 1:17 AM

Reposted by Vincent Francois-Lavet

Marc Lanctot

@sharky6000.bsky.social

deepmind.google/discover/blo... !

Introducing Gemini Robotics and Gemini Robotics-ER, AI models designed for robots to understand, act and react to the physical world.

deepmind.google

March 12, 2025 at 4:25 PM

Reposted by Vincent Francois-Lavet

Csaba Szepesvari

@skiandsolve.bsky.social

www.youtube.com/watch?v=9_Pe... An interview with Rich. The humility of Rich is truly inspiring: "There are no authorities in science". I wish people would listen and live by this.

TURING AWARD WINNER Richard S. Sutton in Conversation with Cam Linke | No Authorities in Science

YouTube video by Amii

www.youtube.com

March 6, 2025 at 8:50 PM

Reposted by Vincent Francois-Lavet

Jonathan Balloch

@balloch.bsky.social

Congrats Andrew and Rich, well deserved!! apnews.com/article/turi...

AI pioneers who channeled 'hedonistic' machines win computer science's top prize

Teaching machines in the way that animal trainers mold the behavior of dogs or horses has been an important method for developing artificial intelligence and one that was recognized Wednesday with the...

apnews.com

March 6, 2025 at 3:41 AM

Reposted by Vincent Francois-Lavet

Gergely Neu

@neu-rips.bsky.social

check this out: new postdoc program for AI-related research in Catalunya!

our group is looking to hire within this program, ideally to work on topics related to RL theory. in case you're interested, pls DM or email me.

(retweets appreciated!)

ramonllull-aira.eu/application

Ramon Llull AIRA Open Calls

Open Calls In our inaugural call scheduled for December 2024, we aim to select up to 17 exceptional postdoctoral fellows, with an additional 16 to be chosen in Call 2 in 2025. 20 December 2024 ...

ramonllull-aira.eu

January 22, 2025 at 4:55 PM

Reposted by Vincent Francois-Lavet

Sung Kim

@sungkim.bsky.social

How DeepSeek R1's Multi-round Conversation works.

api-docs.deepseek.com/guides/reaso...

January 20, 2025 at 5:04 PM

Reposted by Vincent Francois-Lavet

Jim RB

@jbohnslav.bsky.social

Bombshell from DeepSeek: the R1 family of models. Incredibly, it's MIT licensed and they encourage us to distill from it.

The core of the approach is reinforcement learning from verifiable rewards. No PRMs / MCTS. R1-zero doesn't even use SFT to start.

January 20, 2025 at 3:35 PM

Reposted by Vincent Francois-Lavet

Chris Paxton

@cpaxton.bsky.social

I probably don’t need to tell you that 2024 was a huge year for robotics. As a long-time robotics researcher, it’s been amazing to watch; some of the things that I always dreamed about actually seem to be happening.

For me, there are three big stories: itcanthink.substack.com/p/2024-robot...

2024 Robotics Year in Review

Robotics finally feels like it's happening

itcanthink.substack.com

January 2, 2025 at 6:15 PM

Reposted by Vincent Francois-Lavet

Marc Lanctot

@sharky6000.bsky.social

Super happy to reveal our new paper! 🎉🙌♟️

We trained a model to play four games, and the performance in each increases by "external search" (MCTS using a learned world model) and "internal search" where the model outputs the whole plan on its own!

December 5, 2024 at 9:09 AM

Reposted by Vincent Francois-Lavet

David Abel

@dabelcs.bsky.social

RLDM will be held next year in Dublin!

A reminder that the call for workshops is out: rldm.org/call-for-wor...

The workshops are one of my favourite parts of the conference :) please get in touch if you have any questions!

November 22, 2024 at 9:57 AM

Vincent Francois-Lavet

@vinfl.bsky.social

Hello, world! You seem a bit wilder than I expected, but here we are.

November 18, 2024 at 7:15 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news