Vivek Myers
vivekmyers.bsky.social
Vivek Myers
@vivekmyers.bsky.social
PhD student @Berkeley_AI
reinforcement learning, AI, robotics
Thanks to incredible collaborators Bill Zheng, Anca Dragan, Kuan Fang, and Sergey Levine!

Website: tra-paper.github.io
Paper: arxiv.org/pdf/2502.05454
Temporal Representation Alignment: Successor Features Enable Emergent Compositionality in Robot Instruction Following
Temporal Representation Alignment
tra-paper.github.io
February 14, 2025 at 1:39 AM
...but to create truly autonomous self-improving agents, we must not only imitate, but also 𝘪𝘮𝘱𝘳𝘰𝘷𝘦 upon the training capabilities. Our findings suggest that this improvement might emerge from better task representations, rather than more complex learning algorithms. 7/
February 14, 2025 at 1:39 AM
𝘞𝘩𝘺 𝘥𝘰𝘦𝘴 𝘵𝘩𝘪𝘴 𝘮𝘢𝘵𝘵𝘦𝘳? Recent breakthroughs in both end-to-end robot learning and language modeling have been enabled not through complex TD-based reinforcement learning objectives, but rather through scaling imitation with large architectures and datasets... 6/
February 14, 2025 at 1:39 AM
We validated this in simulation. Across offline RL benchmarks, imitation using our TRA task representations outperformed standard behavioral cloning-especially for stitching tasks. In many cases, TRA beat "true" value-based offline RL, using only an imitation loss. 5/
February 14, 2025 at 1:39 AM
Successor features have long been known to boost RL generalization (Dayan, 1993). Our findings suggest something stronger: successor task representations produce emergent capabilities beyond training even without RL or explicit subtask decomposition. 4/
February 14, 2025 at 1:39 AM
This trick encourages a form of time invariance during learning: both nearby and distant goals are represented similarly. By additionally aligning language instructions 𝜉(ℓ) to the goal representations 𝜓(𝑔), the policy can also perform new compound language tasks. 3/
February 14, 2025 at 1:39 AM
What does temporal alignment mean? When training, our policy imitates the human actions that lead to the end goal 𝑔 of a trajectory. Rather than training on the raw goals, we use a representation 𝜓(𝑔) that aligns with the preceding state “successor features” 𝜙(𝑠). 2/
February 14, 2025 at 1:39 AM
What does this mean in practice? To generalize to long-horizon goal-reaching behavior, we should consider how our GCRL algorithms and architectures enable invariance to planning. When possible, prefer architectures like quasimetric networks (MRN, IQE) that enforce this invariance. 6/
February 4, 2025 at 8:37 PM
Empirical results support this theory. The degree of planning invariance and horizon generalization is correlated across environments and GCRL methods. Critics parameterized as a quasimetric distance indeed tend to generalize the most over horizon. 5/
February 4, 2025 at 8:37 PM
Similar to how CNN architectures exploit the inductive bias of translation-invariance for image classification, RL policies can enforce planning invariance by using a *quasimetric* critic parameterization that is guaranteed to obey the triangle inequality. 4/
February 4, 2025 at 8:37 PM
The key to achieving horizon generalization is *planning invariance*. A policy is planning invariant if decomposing tasks into simpler subtasks doesn't improve performance. We prove planning invariance can enable horizon generalization. 3/
February 4, 2025 at 8:37 PM
Certain RL algorithms are more conducive to horizon generalization than others. Goal-conditioned (GCRL) methods with a bilinear critic ϕ(𝑠)ᵀψ(𝑔) as well as quasimetric methods better-enable horizon generalization. 2/
February 4, 2025 at 8:37 PM
Website: empowering-humans.github.io
Paper: arxiv.org/abs/2411.02623

Many thanks to wonderful collaborators Evan Ellis, Sergey Levine, Benjamin Eysenbach, and Anca Dragan!
Learning to Assist Humans without Inferring Rewards
empowering-humans.github.io
January 22, 2025 at 2:17 AM