Lightnews — Scholar-powered news

Harshit Sikchi

@harshitsikchi.bsky.social

1.8K followers 110 following 27 posts

Research @OpenAI. I study Reinforcement Learning. PhD from UT Austin. Previously FAIR Paris, Meta US, NVIDIA, CMU, and IIT Kharagpur.
Website: https://hari-sikchi.github.io/

Posts Replies Media Videos

Harshit Sikchi

@harshitsikchi.bsky.social

(6/n) With RLZero, you can just pass in a YouTube video and ask an agent to mimic the behavior instantly. This brings us closer to true zero-shot cross-embodiment transfer.

December 11, 2024 at 7:11 AM

Harshit Sikchi

@harshitsikchi.bsky.social

(5/n) RLZero’s Prompt to Policy: Asking a humanoid agent to perform a headstand.

December 11, 2024 at 7:11 AM

Harshit Sikchi

@harshitsikchi.bsky.social

(4/n) Reward is an inconvenient and easily hackable form of task specification. Now, we can prompt and obtain behaviors zero-shot with language. Example: Asking a walker agent to perform a cartwheel.

December 11, 2024 at 7:11 AM

Harshit Sikchi

@harshitsikchi.bsky.social

🤖 Introducing RL Zero 🤖: a new approach to transform language into behavior zero-shot for embodied agents without labeled datasets!

December 11, 2024 at 7:11 AM

Harshit Sikchi

@harshitsikchi.bsky.social

(6/n) With RLZero, you can just pass in a YouTube video and ask an agent to mimic the behavior instantly. This brings us closer to true zero-shot cross-embodiment transfer.

December 10, 2024 at 8:14 AM

Harshit Sikchi

@harshitsikchi.bsky.social

(5/n) RLZero’s Prompt to Policy: Asking a humanoid agent to perform a headstand.

December 10, 2024 at 8:14 AM

Harshit Sikchi

@harshitsikchi.bsky.social

(4/n) Reward is an inconvenient and easily hackable form of task specification. Now, we can prompt and obtain behaviors zero-shot with language. Example: Asking a walker agent to perform a cartwheel.

December 10, 2024 at 8:14 AM

Harshit Sikchi

@harshitsikchi.bsky.social

What if I told you all solutions for RL lie on a (hyper) plane? Then, we can use that fact to learn a compressed representation for MDP that unlocks efficient policy inference for any reward fn. On this plane, solving RL is equivalent to solving a linear constrained optimization!

December 3, 2024 at 12:33 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news