Paper: ias.informatik.tu-darmstadt.de/uploads/Team...
Paper: ias.informatik.tu-darmstadt.de/uploads/Team...
Paper: ias.informatik.tu-darmstadt.de/uploads/Team...
Paper: ias.informatik.tu-darmstadt.de/uploads/Team...
Website: nico-bohlinger.github.io/bridge_the_g...
Website: nico-bohlinger.github.io/bridge_the_g...
Website: nico-bohlinger.github.io/gait_in_eigh...
Website: nico-bohlinger.github.io/gait_in_eigh...
- Today, WS Sim-to-Real Transfer for Humanoid Robots
at Humanoids2025
- Oct 20th, WS Foundation Models for Robotic Design
at IROS2025
- Oct 24th, WS Reconfigurable Modular Robots
at IROS2025
- Today, WS Sim-to-Real Transfer for Humanoid Robots
at Humanoids2025
- Oct 20th, WS Foundation Models for Robotic Design
at IROS2025
- Oct 24th, WS Reconfigurable Modular Robots
at IROS2025
Bo Ai, Liu Dai, Dichen Li, Tongzhou Mu, Zhanxin Wu, K. Fay, Henrik I. Christensen, @jan-peters.bsky.social and Hao Su
Bo Ai, Liu Dai, Dichen Li, Tongzhou Mu, Zhanxin Wu, K. Fay, Henrik I. Christensen, @jan-peters.bsky.social and Hao Su
Thanks to @ias-tudarmstadt.bsky.social, @jan-peters.bsky.social
Thanks to @ias-tudarmstadt.bsky.social, @jan-peters.bsky.social
🔗 Full paper: arxiv.org/abs/2502.11949
✨ Finally, many thanks to @jan-peters.bsky.social and
@ias-tudarmstadt.bsky.social for the support!
🔗 Full paper: arxiv.org/abs/2502.11949
✨ Finally, many thanks to @jan-peters.bsky.social and
@ias-tudarmstadt.bsky.social for the support!
We also explored architectures like Universal Neural Functionals (UNF) and action-based representations ("Probing").
And yes, our scaled EPVFs are competitive with PPO and SAC in their final performance.
We also explored architectures like Universal Neural Functionals (UNF) and action-based representations ("Probing").
And yes, our scaled EPVFs are competitive with PPO and SAC in their final performance.
Key ingredients for stability and performance are weight clipping and using uniform noise scaled to the parameter magnitudes.
Our ablation studies show just how critical these components are. Without them, performance collapses.
Key ingredients for stability and performance are weight clipping and using uniform noise scaled to the parameter magnitudes.
Our ablation studies show just how critical these components are. Without them, performance collapses.
We see strong scaling effects when using MJX to rollout up to 4000 differently perturbed policies in parallel.
This explores the policy space effectively and large batches drastically reduce the variance of the resulting gradients.
We see strong scaling effects when using MJX to rollout up to 4000 differently perturbed policies in parallel.
This explores the policy space effectively and large batches drastically reduce the variance of the resulting gradients.
This unlocks fully off-policy learning and policy parameter space exploration using any policy data and leads to the probably most simple DRL algorithm one can imagine:
This unlocks fully off-policy learning and policy parameter space exploration using any policy data and leads to the probably most simple DRL algorithm one can imagine:
Imagine a value function that understands the policy's parameters directly: V(θ).
This allows for direct, gradient-based policy updates:
Imagine a value function that understands the policy's parameters directly: V(θ).
This allows for direct, gradient-based policy updates:
Also thanks to MAB Robotics for providing the hardware and constant support!
Also thanks to MAB Robotics for providing the hardware and constant support!