👉 arxiv.org/pdf/2509.25174
Code coming soon.
We’d love feedback & discussion! 💬
👉 arxiv.org/pdf/2509.25174
Code coming soon.
We’d love feedback & discussion! 💬
Well-conditioned optimization > raw scale.
XQC proves principled architecture choices can outperform larger, more complex ones
Well-conditioned optimization > raw scale.
XQC proves principled architecture choices can outperform larger, more complex ones
⚡️ Matches/outperforms SimbaV2, BRO, BRC, MRQ, and DRQ-V2
🌿~4.5× fewer parameters and 1/10 FLOP/s of SimbaV2
💪Especially strong on the hardest tasks: HumanoidBench, DMC Hard & DMC Humanoids from pixels
⚡️ Matches/outperforms SimbaV2, BRO, BRC, MRQ, and DRQ-V2
🌿~4.5× fewer parameters and 1/10 FLOP/s of SimbaV2
💪Especially strong on the hardest tasks: HumanoidBench, DMC Hard & DMC Humanoids from pixels
✅ only 4 hidden layers
✅ BN after each linear layer
✅ WN projection
✅ CE critic loss
Simplicity + principled design = efficiency ⚡️
✅ only 4 hidden layers
✅ BN after each linear layer
✅ WN projection
✅ CE critic loss
Simplicity + principled design = efficiency ⚡️
➡️Result: Stable effective learning rates and smoother optimization.
➡️Result: Stable effective learning rates and smoother optimization.
Can better conditioning beat scaling?
By analyzing the Hessian eigenspectrum of critic networks, we uncover how different architectural choices shape optimization landscapes.
Can better conditioning beat scaling?
By analyzing the Hessian eigenspectrum of critic networks, we uncover how different architectural choices shape optimization landscapes.
@hessianai.bsky.social @ias-tudarmstadt.bsky.social @dfki.bsky.social @cs-tudarmstadt.bsky.social
#RL #ML #AI
@hessianai.bsky.social @ias-tudarmstadt.bsky.social @dfki.bsky.social @cs-tudarmstadt.bsky.social
#RL #ML #AI
We'd love to hear your thoughts and feedback!
Come talk to us at RLDM in June in Dublin (rldm.org)
We'd love to hear your thoughts and feedback!
Come talk to us at RLDM in June in Dublin (rldm.org)
Paper: t.co/Z6QrMxZaPY
Paper: t.co/Z6QrMxZaPY
✅ Needs 90% fewer parameters (~600K vs. 5M)
✅ Avoids parameter resets
✅ Scales stably with compute.
We also compare strongly to the concurrent SIMBA algorithm.
No tricks—just principled normalization. ✨
✅ Needs 90% fewer parameters (~600K vs. 5M)
✅ Avoids parameter resets
✅ Scales stably with compute.
We also compare strongly to the concurrent SIMBA algorithm.
No tricks—just principled normalization. ✨
We match or outperform SOTA on 25 continuous control tasks from DeepMind Control Suite & MyoSuite, including dog 🐕 and humanoid🧍♂️tasks across UTDS.
We match or outperform SOTA on 25 continuous control tasks from DeepMind Control Suite & MyoSuite, including dog 🐕 and humanoid🧍♂️tasks across UTDS.
💡Solution: After each gradient update, we rescale parameters to the unit sphere, preserving plasticity and keeping the effective learning rate stable.
💡Solution: After each gradient update, we rescale parameters to the unit sphere, preserving plasticity and keeping the effective learning rate stable.
However, BN regularized networks are scale invariant w.r.t. their weights; yet, the gradient scales inversely proportional (Van Laarhoven 2017)
However, BN regularized networks are scale invariant w.r.t. their weights; yet, the gradient scales inversely proportional (Van Laarhoven 2017)
We identify why scaling CrossQ fails—and fix it with a surprisingly effective tweak: Weight Normalization (WN). 🏋️
We identify why scaling CrossQ fails—and fix it with a surprisingly effective tweak: Weight Normalization (WN). 🏋️