Lightnews — Scholar-powered news

Vineeth Yeevani

@vyeevani.bsky.social

140 followers 1.1K following 37 posts

Working on robotics. Prev @ Apple on Vision Pro.
vyeevani.github.io

Posts Replies Media Videos

Vineeth Yeevani

@vyeevani.bsky.social

Also, the technique makes robotics more accessible while trading understandability. People find that useful in other tools (llm, gen image models). This makes me think this will be useful also.

February 10, 2025 at 9:25 PM

Vineeth Yeevani

@vyeevani.bsky.social

I was wrong about this. My change of heart was inspired by dexgen (zhaohengyin.github.io/dexteritygen/). They use BC for course motion, RL for fine grained. Instant policy could just do both in a single shot. Instant policy would be great for tele-op

DexGen

zhaohengyin.github.io

February 10, 2025 at 9:18 PM

Vineeth Yeevani

@vyeevani.bsky.social

Goal: vlm capable of planning + action
Problem: no data
Solution:
Bootstrap intelligence from vlm
1. Start with off the shelf vlm
2. Collect rollouts from code as policy from vlm for a set of tasks
3. GRPO over rollouts
4. Goto 2
5. Offline RL over vlm for direct obs->act

February 1, 2025 at 10:06 PM

Vineeth Yeevani

@vyeevani.bsky.social

This btw is just regular old RL + curriculum learning. However, unlike control RL where we learn on the test set, here we have diff envs from train and test time.

January 30, 2025 at 6:32 PM

Vineeth Yeevani

@vyeevani.bsky.social

Recipe.
1. Start with weak base + problems that range from really simple to really hard
3. Sort problems by how well model does on them
4. Pick 75% problems model can do, 25% it can’t. RL with GRPO. Use 10x rollouts + high temp on 25%
5. Repeat step 3 till all problems solved

January 30, 2025 at 6:30 PM

Vineeth Yeevani

@vyeevani.bsky.social

Turns out problem was too low resolution. Going higher res fixed the problem.

December 26, 2024 at 10:43 PM

Vineeth Yeevani

@vyeevani.bsky.social

On second thought, rectified flow doesn’t help with this problem. Need to look for something different

December 26, 2024 at 4:25 AM

Vineeth Yeevani

@vyeevani.bsky.social

The samples look like the mean of the input. Suspicion that the similar examples result in a nullcline in velocity when t -> 1. Rectified flow should fix this. But, I hate the multi stage process of reflow.

December 25, 2024 at 9:36 PM

Vineeth Yeevani

@vyeevani.bsky.social

Keeping functionality in modular libraries helps everyone doesn’t have to rewrite everything. Least cohesion of modules is a great rule of thumb.

December 18, 2024 at 9:11 PM

Vineeth Yeevani

@vyeevani.bsky.social

That’s the best board I’ve ever see

December 14, 2024 at 2:11 AM

Vineeth Yeevani

@vyeevani.bsky.social

The reward signal here would be if the user accepts some task completion or not. You’re fine tuning to improve the success rate of the agents taking into account the multi-turn decision making nature of agents.

November 30, 2024 at 1:06 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news