Stannis Zhou
stanniszhou.bsky.social
Stannis Zhou
@stanniszhou.bsky.social
Research Scientist at Google DeepMind
stanniszhou.github.io
The advantages of MPC (over policy learning) are that it can be trained on suboptimal reward-free data, and can then be used to optimize new reward functions on the fly. Below we ask a 2d walker agent to achieve different target heights - for 1.4m it has to repeatedly jump! 2/4
November 23, 2024 at 4:33 AM