Stannis Zhou
stanniszhou.bsky.social
Stannis Zhou
@stanniszhou.bsky.social
Research Scientist at Google DeepMind
stanniszhou.github.io
Joint work with Sivaramakrishnan Swaminathan, Rajkumar Vasudeva Raju, J. Swaroop Guntupalli, Wolfgang Lehrach, Joseph Ortiz, Antoine Dedieu, Miguel Lázaro-Gredilla and Kevin Murphy #DiffusionModels #ReinforcementLearning #Robotics #Control 4/4
November 23, 2024 at 4:33 AM
The disadvantage of MPC is that searching over action trajectories can be slow, so we train another diffusion model (on offline data) that acts as a proposal distribution over action trajectories, then use a simple "sample, score, and rank" (SSR) optimizer. 3/4
November 23, 2024 at 4:33 AM
The advantages of MPC (over policy learning) are that it can be trained on suboptimal reward-free data, and can then be used to optimize new reward functions on the fly. Below we ask a 2d walker agent to achieve different target heights - for 1.4m it has to repeatedly jump! 2/4
November 23, 2024 at 4:33 AM