Lightnews — Scholar-powered news

Vivek Myers

@vivekmyers.bsky.social

Thanks to incredible collaborators Bill Zheng, Anca Dragan, Kuan Fang, and Sergey Levine!

Website: tra-paper.github.io
Paper: arxiv.org/pdf/2502.05454

Temporal Representation Alignment: Successor Features Enable Emergent Compositionality in Robot Instruction Following

Temporal Representation Alignment

tra-paper.github.io

February 14, 2025 at 1:39 AM

Vivek Myers

@vivekmyers.bsky.social

...but to create truly autonomous self-improving agents, we must not only imitate, but also 𝘪𝘮𝘱𝘳𝘰𝘷𝘦 upon the training capabilities. Our findings suggest that this improvement might emerge from better task representations, rather than more complex learning algorithms. 7/

February 14, 2025 at 1:39 AM

Vivek Myers

@vivekmyers.bsky.social

𝘞𝘩𝘺 𝘥𝘰𝘦𝘴 𝘵𝘩𝘪𝘴 𝘮𝘢𝘵𝘵𝘦𝘳? Recent breakthroughs in both end-to-end robot learning and language modeling have been enabled not through complex TD-based reinforcement learning objectives, but rather through scaling imitation with large architectures and datasets... 6/

February 14, 2025 at 1:39 AM

Vivek Myers

@vivekmyers.bsky.social

We validated this in simulation. Across offline RL benchmarks, imitation using our TRA task representations outperformed standard behavioral cloning-especially for stitching tasks. In many cases, TRA beat "true" value-based offline RL, using only an imitation loss. 5/

February 14, 2025 at 1:39 AM

Vivek Myers

@vivekmyers.bsky.social

Successor features have long been known to boost RL generalization (Dayan, 1993). Our findings suggest something stronger: successor task representations produce emergent capabilities beyond training even without RL or explicit subtask decomposition. 4/

February 14, 2025 at 1:39 AM

Vivek Myers

@vivekmyers.bsky.social

This trick encourages a form of time invariance during learning: both nearby and distant goals are represented similarly. By additionally aligning language instructions 𝜉(ℓ) to the goal representations 𝜓(𝑔), the policy can also perform new compound language tasks. 3/

February 14, 2025 at 1:39 AM

Vivek Myers

@vivekmyers.bsky.social

What does temporal alignment mean? When training, our policy imitates the human actions that lead to the end goal 𝑔 of a trajectory. Rather than training on the raw goals, we use a representation 𝜓(𝑔) that aligns with the preceding state “successor features” 𝜙(𝑠). 2/

February 14, 2025 at 1:39 AM

Vivek Myers

@vivekmyers.bsky.social

With wonderful collaborators @crji.bsky.social, @ben-eysenbach.bsky.social !
Paper: arxiv.org/abs/2501.02709
Website: horizon-generalization.github.io
Code: github.com/vivekmyers/h...

Horizon Generalization in Reinforcement Learning

We study goal-conditioned RL through the lens of generalization, but not in the traditional sense of random augmentations and domain randomization. Rather, we aim to learn goal-directed policies that ...

arxiv.org

February 4, 2025 at 8:37 PM

Vivek Myers

@vivekmyers.bsky.social

What does this mean in practice? To generalize to long-horizon goal-reaching behavior, we should consider how our GCRL algorithms and architectures enable invariance to planning. When possible, prefer architectures like quasimetric networks (MRN, IQE) that enforce this invariance. 6/

February 4, 2025 at 8:37 PM

Vivek Myers

@vivekmyers.bsky.social

Empirical results support this theory. The degree of planning invariance and horizon generalization is correlated across environments and GCRL methods. Critics parameterized as a quasimetric distance indeed tend to generalize the most over horizon. 5/

February 4, 2025 at 8:37 PM

Vivek Myers

@vivekmyers.bsky.social

Similar to how CNN architectures exploit the inductive bias of translation-invariance for image classification, RL policies can enforce planning invariance by using a *quasimetric* critic parameterization that is guaranteed to obey the triangle inequality. 4/

February 4, 2025 at 8:37 PM

Vivek Myers

@vivekmyers.bsky.social

The key to achieving horizon generalization is *planning invariance*. A policy is planning invariant if decomposing tasks into simpler subtasks doesn't improve performance. We prove planning invariance can enable horizon generalization. 3/

February 4, 2025 at 8:37 PM

Vivek Myers

@vivekmyers.bsky.social

Certain RL algorithms are more conducive to horizon generalization than others. Goal-conditioned (GCRL) methods with a bilinear critic ϕ(𝑠)ᵀψ(𝑔) as well as quasimetric methods better-enable horizon generalization. 2/

February 4, 2025 at 8:37 PM

Vivek Myers

@vivekmyers.bsky.social

Website: empowering-humans.github.io
Paper: arxiv.org/abs/2411.02623

Many thanks to wonderful collaborators Evan Ellis, Sergey Levine, Benjamin Eysenbach, and Anca Dragan!

Learning to Assist Humans without Inferring Rewards

empowering-humans.github.io

January 22, 2025 at 2:17 AM

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news