Olivier Codol
oliviercodol.bsky.social
Olivier Codol
@oliviercodol.bsky.social
Post-doc at Mila & U de Montréal in Guillaume Lajoie & Matt Perich's labs
Focus on neuroscience, RL for motor learning, neural control of movement, NeuroAI.
Yes! The advantages are much clearer wrt neural computation (memory, expressivity, and gradient propagation) than for exploration per se.
November 7, 2025 at 4:09 AM
Learning through motor noise (exploration) is well documented in humans (lots of cool work from Shadmehr and @olveczky.bsky.social) but the scale is rather small. Here if the dynamical regime helps exploration I’d say it should be within these scales as well.
November 7, 2025 at 3:27 AM
That being said this is not how we move (execute movements) and in that sense this is a model of learning rather control.
November 7, 2025 at 3:17 AM
I would say yes it’s possible. Particularly because a deviation is carried over instead of collapsing back, so the filtering function that non linear muscle activations have will not impact it as much as white noise.
November 7, 2025 at 3:16 AM
As in if the edge of chaos regime is a consequence or if it is a cause of RL’s need for exploration?
November 7, 2025 at 3:02 AM
As always a huge thank you to my colleagues and supervisors @glajoie.bsky.social @mattperich.bsky.social and @nandahkrishna.bsky.social for helping make this work what it is—and making the journey so fun and interesting
November 6, 2025 at 2:14 AM
We’re pleased to see RL's role in neural plasticity is increasingly under focus in the motor control community (check out @adrianhaith.bsky.social's latest piece!)
I strongly believe motor learning is sitting at the interface of many plasticity mechanisms and RL is an important piece of this puzzle.
New Pre-Print:
www.biorxiv.org/cgi/content/...

We’re all familiar with having to practice a new skill to get better at it, but what really happens during practice? The answer, I propose, is reinforcement learning - specifically policy-gradient reinforcement learning.

Overview 🧵 below...
Policy-Gradient Reinforcement Learning as a General Theory of Practice-Based Motor Skill Learning
Mastering any new skill requires extensive practice, but the computational principles underlying this learning are not clearly understood. Existing theories of motor learning can explain short-term ad...
www.biorxiv.org
November 6, 2025 at 2:10 AM
Along the above, we add discussion points that I hope will clarify some of our stance on the topic of RL in neuroscience and acknowledge some past important work that we believe our study complements. We also add several important controls (particularly Figs. S8, S14). Feel free to check it all out!
November 6, 2025 at 2:10 AM
“Edge of chaos” dynamics are long recognized as a computationally potent dynamical regime that avoids vanishing gradients during learning and allows greater memory and expressivity of a system. This stark difference surprised us, and we think it can help explain our results on neural adaptation.
November 6, 2025 at 2:10 AM
Indeed, Lyapunov exponents show that fixed points for RL models largely stay near 0, showing these networks’ dynamics lie at the edge of chaos. Whereas SL models’ dynamics are contractive and orderly, keeping very little information in memory for long and having stereotyped expressivity.
November 6, 2025 at 2:10 AM
Does this mean SL models are very orderly, while RL models lie at the interface between order and chaos? To formally confirm, we looked at Lyapunov exponents, which tell us how fast close-by states diverge. Unlike Jacobians, this tells us about long-horizon, not just local, dynamics.
November 6, 2025 at 2:10 AM
We looked at local dynamics around fixed points over time. This showed that SL models’ fixed points are indeed very stable, having nearly all modes of their eigenspectrum <1. RL models showed many more self-sustaining modes ≈1, again demonstrating isometric dynamics.
November 6, 2025 at 2:10 AM
A dynamical system could recover perfectly against a state perturbation, or it could expand following that perturbation. It turns out supervised learning (SL) models do the former, while reinforcement learning (RL) models do something in-between; they act as isometric systems.
November 6, 2025 at 2:10 AM
But a biological brain receives an ever-changing stream of inputs, rarely ever reducing to steady-state inputs. Our models reflect that, and their inputs are time varying.

So we took a slightly different approach, and asked how fixed-points evolved over time and over perturbed neural states.
November 6, 2025 at 2:10 AM
Usually, one determines where neural activity naturally settles under a steady-state input regime to find “fixed-point” neural states. Local dynamics around these points provides valuable information about how neural networks process information—that is, what they compute, and how.
November 6, 2025 at 2:10 AM
But alignment metrics can overlook the question of what gives rise to the differences they capture. We approached this using a now established framework in systems neuroscience, dynamical systems theory.
November 6, 2025 at 2:10 AM
This similarity to NHP neural recordings was true for geometric similarity metrics (CCA), but also for dynamical similarity. Importantly, this was only evident when our models were trained to control biomechanistically realistic effectors.
November 6, 2025 at 2:10 AM