Lightnews — Scholar-powered news

Roger Creus Castanyer

@roger-creus.bsky.social

13 followers 6 following 10 posts

Maximizing the unexpected return.

PhD student @mila-quebec.bsky.social

Posts Replies Media Videos

Roger Creus Castanyer

@roger-creus.bsky.social

This work was a fantastic collaboration with @johanobandoc.bsky.social, Lu Li, Pierre-Luc Bacon, @glenberseth.bsky.social, Aaron Courville, @pcastr.bsky.social 🙌

📄 Paper: arxiv.org/abs/2506.15544
💻 Code: github.com/roger-creus/...

June 23, 2025 at 12:58 PM

Roger Creus Castanyer

@roger-creus.bsky.social

We also combine our methods with recent scaling strategies like Simba , and still see additional improvements in performance and stability 📊

June 23, 2025 at 12:58 PM

Roger Creus Castanyer

@roger-creus.bsky.social

🔁 Back to our artificial non-stationary supervised setting:

With both interventions, networks now show near-perfect plasticity. They can continually fit changing data across network scales. No collapse 💪

June 23, 2025 at 12:58 PM

Roger Creus Castanyer

@roger-creus.bsky.social

Combining both ideas, we see massive gains on the ALE (Atari)! 🎮🔥

We benchmark against PQN and PPO, and the improvements are remarkable, both in terms of performance and scalability.

June 23, 2025 at 12:58 PM

Roger Creus Castanyer

@roger-creus.bsky.social

💡 Our second intervention: better optimizers

Second-order estimators capture curvature, providing more stable updates than first-order methods

Our ablations show the Kron optimizer shines in deep RL, helping agents adapt as learning evolves ✨

June 23, 2025 at 12:58 PM

Roger Creus Castanyer

@roger-creus.bsky.social

💡 Our first intervention: better architectures for stable gradients.

Residual connections create shortcuts that preserve gradients and avoid vanishing ⚡️

We extend this with MultiSkip: broadcasting features to all fully connected layers, ensuring direct gradient flow at scale

June 23, 2025 at 12:58 PM

Roger Creus Castanyer

@roger-creus.bsky.social

So what's going on?
Gradients are at the heart of this instability.

We relate vanishing gradients to classic RL diagnostics:
⚫ Dead neurons
⚫ Representation collapse
⚫ Loss landscape instability (from bootstrapping)

June 23, 2025 at 12:58 PM

Roger Creus Castanyer

@roger-creus.bsky.social

We test the same architectures in supervised learning: gradients propagate well and performance scales ✅

But in RL, performance collapses due to non-stationarity!

We simulate this by shuffling labels periodically. The result? Gradients degrade, and bigger networks collapse! 🚫

June 23, 2025 at 12:58 PM

Roger Creus Castanyer

@roger-creus.bsky.social

Scaling up networks in deep RL is tricky: do it naively, and performance collapses 😵‍💫

Why? Increasing depth and width leads to severe vanishing gradients, causing unstable learning ⚠️

We diagnose this issue across several algorithms: DQN, Rainbow, PQN, PPO, SAC, and DDPG.

June 23, 2025 at 12:58 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news