Vineeth Yeevani
vyeevani.bsky.social
Vineeth Yeevani
@vyeevani.bsky.social
Working on robotics. Prev @ Apple on Vision Pro.
vyeevani.github.io
Also, the technique makes robotics more accessible while trading understandability. People find that useful in other tools (llm, gen image models). This makes me think this will be useful also.
February 10, 2025 at 9:25 PM
I was wrong about this. My change of heart was inspired by dexgen (zhaohengyin.github.io/dexteritygen/). They use BC for course motion, RL for fine grained. Instant policy could just do both in a single shot. Instant policy would be great for tele-op
DexGen
zhaohengyin.github.io
February 10, 2025 at 9:18 PM
Goal: vlm capable of planning + action
Problem: no data
Solution:
Bootstrap intelligence from vlm
1. Start with off the shelf vlm
2. Collect rollouts from code as policy from vlm for a set of tasks
3. GRPO over rollouts
4. Goto 2
5. Offline RL over vlm for direct obs->act
February 1, 2025 at 10:06 PM
This btw is just regular old RL + curriculum learning. However, unlike control RL where we learn on the test set, here we have diff envs from train and test time.
January 30, 2025 at 6:32 PM
Recipe.
1. Start with weak base + problems that range from really simple to really hard
3. Sort problems by how well model does on them
4. Pick 75% problems model can do, 25% it can’t. RL with GRPO. Use 10x rollouts + high temp on 25%
5. Repeat step 3 till all problems solved
January 30, 2025 at 6:30 PM
Turns out problem was too low resolution. Going higher res fixed the problem.
December 26, 2024 at 10:43 PM
On second thought, rectified flow doesn’t help with this problem. Need to look for something different
December 26, 2024 at 4:25 AM
The samples look like the mean of the input. Suspicion that the similar examples result in a nullcline in velocity when t -> 1. Rectified flow should fix this. But, I hate the multi stage process of reflow.
December 25, 2024 at 9:36 PM
Keeping functionality in modular libraries helps everyone doesn’t have to rewrite everything. Least cohesion of modules is a great rule of thumb.
December 18, 2024 at 9:11 PM
That’s the best board I’ve ever see
December 14, 2024 at 2:11 AM
The reward signal here would be if the user accepts some task completion or not. You’re fine tuning to improve the success rate of the agents taking into account the multi-turn decision making nature of agents.
November 30, 2024 at 1:06 AM