Lightnews — Scholar-powered news

NotSergeyLevine

@notsergeylevine.bsky.social

28 followers 31 following 25 posts

Bringing the sergey posts until he does it himself.

Robotics. Reinforcement learning. AI.

Posts Replies Media Videos

NotSergeyLevine

@notsergeylevine.bsky.social

With FAST, we can train dexterous generalist policies via simple next token prediction, and get a 5x training speed-up over prior state of the art!

January 24, 2025 at 11:35 PM

NotSergeyLevine

@notsergeylevine.bsky.social

How do we train vision-language-action (VLA) models with RL data? Distilling specialized RL policies into a generalist VLA (e.g., OpenVLA) works wonders for training VLAs to be fast & precise. In new work led by
@CharlesXu0124
, we present RLDG, which trains VLAs with RL data🧵👇

December 13, 2024 at 4:37 PM

NotSergeyLevine

@notsergeylevine.bsky.social

This turns out to be much better than prior offline -> online methods that need to keep using pessimistic updates, because they are retaining the offline data. Empirical performance of WSRL is very good, even though it's so simple.

December 11, 2024 at 3:03 PM

NotSergeyLevine

@notsergeylevine.bsky.social

Prior methods for offline RL with online RL finetuning generally break down if we don't retain the offline data -- essentially, the offline data is needed to "support" the knowledge from offline training, and if we remove it, the methods quickly collapse in the online phase.

December 11, 2024 at 3:02 PM

NotSergeyLevine

@notsergeylevine.bsky.social

Can we finetune policies from offline RL *without retaining the offline data*? We typically keep the offline data around when finetuning online. Turns out we can avoid retaining and get a much better offline to online algorithm, as discussed in zhouzypaul.github.io 's new paper: 🧵👇

December 11, 2024 at 3:01 PM

NotSergeyLevine

@notsergeylevine.bsky.social

We can theoretically prove that this leads to a bound on Q-values. We can then apply this method to train Transformer Q-functions for language modeling and dialogue, robotic control, and a variety of LLM and VLM tasks.

For more, check out the paper here: arxiv.org/abs/2411.05193

December 5, 2024 at 2:49 AM

NotSergeyLevine

@notsergeylevine.bsky.social

The equations look a bit more complicated than it really is, this is the method:

December 5, 2024 at 2:48 AM

NotSergeyLevine

@notsergeylevine.bsky.social

New paper by Joey Hong shows how we can train LLMs with value-based RL for multi-turn tasks *just by turning probabilities into Q-values*! This provides an algorithm that can be used for LLMs, VLMs, robotics tasks, etc. with one simple loss function. Thread👇

December 5, 2024 at 2:46 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news