Abhishek Gupta
abhishekunique7.bsky.social
Abhishek Gupta
@abhishekunique7.bsky.social
Assistant Professor, Paul G. Allen School of Computer Science and Engineering, University of Washington

Visiting Faculty, NVIDIA

Ph.D. from Berkeley, Postdoc MIT

https://homes.cs.washington.edu/~abhgupta

I like robots and reinforcement learning :)
The key insight is - if you’re going to transfer from sim-to-real, be careful about transferring exploration behavior rather than just policies.Take a look at our paper: arxiv.org/abs/2410.20254. Fun collaboration with Andrew Wagenmaker, Kevin Huang, Kay Ke, Byron Boots, @kjamieson.bsky.social(6/6)
Overcoming the Sim-to-Real Gap: Leveraging Simulation to Learn to Explore for Real-World RL
In order to mitigate the sample complexity of real-world reinforcement learning, common practice is to first train a policy in a simulator where samples are cheap, and then deploy this policy in the r...
arxiv.org
December 6, 2024 at 12:46 AM
Given these insights,we develop a practical instantiation using a diversity driven policy learning algorithm in sim. This learns diverse exploration in the neighborhood of an optimal policy. Exploring using this policy learned in sim can then enable efficient policy learning in the real world (5/6)
December 6, 2024 at 12:46 AM
We show that exploration policies learned in sim can be played in the real world to collect data, performing policy improvement with RL. These policies cannot be naively played but must be combined with some degree of random exploration. However, this exploration is now provably efficient! (4/6)
December 6, 2024 at 12:46 AM
We propose a simple fix - instead of learning *optimal* policies in sim, learn exploratory policies in sim. Since data collection in sim is cheap, we learn exploratory policies that have broad coverage. Even with domain gap, exploratory policies in sim can explore in the real world. (3/6)
December 6, 2024 at 12:46 AM
The typical paradigm for sim2real transfers a policy and hopes for few-shot finetuning with naive exploration. First, we show that there exist envs where this can be exponentially inefficient. An overconfident, wrong policy in an incorrect sim can lead to poor real-world exploration (2/6)
December 6, 2024 at 12:46 AM
I’m excited about the doors this opens for generalizable robot pre-training!

Paper: arxiv.org/abs/2412.01770
Website: casher-robot-learning.github.io/CASHER/

Fun project w/ @marcelto.bsky.social , @arhanjain.bsky.social , Carrie Yuan, Macha V, @ankile.bsky.social, Anthony S, Pulkit Agrawal :)
Robot Learning with Super-Linear Scaling
Scaling robot learning requires data collection pipelines that scale favorably with human effort. In this work, we propose Crowdsourcing and Amortizing Human Effort for Real-to-Sim-to-Real(CASHER), a ...
arxiv.org
December 5, 2024 at 2:13 AM
I’m also a sucker for a fun website. Check out our interactive demo where you can see some of the environments and learned behaviors. We’ve also open sourced USDZ assets of the sourced environments. (8/N)
December 5, 2024 at 2:13 AM
Why do I care - we’re going to have to consider off-domain data for robotics, and realistic simulation constructed cheaply from video provides a scalable way to source this data. Building methods that scale sublinearly with human effort make this practical, and generalizable. (7/N)
December 5, 2024 at 2:13 AM
Step 5: One neat feature is that in a test environment, human demos aren’t even required. Scan in an environment video to build a test-time simulation and let the generalist model provide itself demos and improve with RL in sim. Results in over 50% improvement with 0 human effort (6/N)
December 5, 2024 at 2:13 AM
Step 4: Transfer over to the real world, either zero-shot or with some co-training. Shows scaling laws as more experience is encountered, and robust performance across distractors, object positions, visual conditions and disturbances. (5/N)
December 5, 2024 at 2:13 AM
Step 3: Providing even 10 demos per env is still expensive. By training generalists from RL data, we get cross-environment generalization that allows the model to provide *itself* demos and only use human effort when necessary. The better the generalist gets, the less human effort is required. (4/N)
December 5, 2024 at 2:13 AM
Step 2: Train policies on these environments with demo-bootstrapped RL. A couple of demos are needed to guide exploration, but the heavy lifting is done with large scale RL in simulation . This takes success rates from 2-3% to >90% success from <10 human demos. (3/N)
December 5, 2024 at 2:13 AM
Step 1: Collect lots of environments with video scans - anyone can do it with their phone. I even had my parents scan in a bunch :) use 3D reconstruction methods like Gaussian splats to make diverse, visually & geometrically realistic sim environments for training policies (2/N)
December 5, 2024 at 2:13 AM