Glen Berseth
@glenberseth.bsky.social
Assistant Prof at @UMontreal @mila-quebec.bsky.social @MontrealRobots
. CIFAR AI Chair, RL_Conference chair. Creating generalist problem-solving agents for the real world. He/him/il.
. CIFAR AI Chair, RL_Conference chair. Creating generalist problem-solving agents for the real world. He/him/il.
Surprise/empowerment/etc may be the fundamental objectives living organisms optimize, however it is very difficult to optimize these objectives. I will be giving a talk at international worlshop on #activeinference on how foundational models can help improve these methods.
October 17, 2025 at 3:13 PM
Surprise/empowerment/etc may be the fundamental objectives living organisms optimize, however it is very difficult to optimize these objectives. I will be giving a talk at international worlshop on #activeinference on how foundational models can help improve these methods.
For those interested in joining my lab, submit your application via the Mila form. This year I am particularly interested in students with skills/interests in robotics, reinforcement learning and, foundational models which will push forward the abilities of real world agents.
October 15, 2025 at 1:02 PM
For those interested in joining my lab, submit your application via the Mila form. This year I am particularly interested in students with skills/interests in robotics, reinforcement learning and, foundational models which will push forward the abilities of real world agents.
There are many ways to learn or compute a critic that can help score the performance of different actions. This is not the full story. If you want more details, go read rlhfbook.com/c/11-policy-...
October 1, 2025 at 12:49 AM
There are many ways to learn or compute a critic that can help score the performance of different actions. This is not the full story. If you want more details, go read rlhfbook.com/c/11-policy-...
GRPO is more like REINFORCE than PPO.
1) It does not train a critic (no need with small variance)
2) The SCORE FUNCTION (difficult to call this an advantage) is over a batch using the same initial prompt (similar to the vine sample method from TRPO)
1) It does not train a critic (no need with small variance)
2) The SCORE FUNCTION (difficult to call this an advantage) is over a batch using the same initial prompt (similar to the vine sample method from TRPO)
October 1, 2025 at 12:49 AM
GRPO is more like REINFORCE than PPO.
1) It does not train a critic (no need with small variance)
2) The SCORE FUNCTION (difficult to call this an advantage) is over a batch using the same initial prompt (similar to the vine sample method from TRPO)
1) It does not train a critic (no need with small variance)
2) The SCORE FUNCTION (difficult to call this an advantage) is over a batch using the same initial prompt (similar to the vine sample method from TRPO)
On my way to South Korea for a week packed with robotics at the conference on Robot Learning, Humanoids2025, and the global forum on mechanical engineering.
September 24, 2025 at 12:23 PM
On my way to South Korea for a week packed with robotics at the conference on Robot Learning, Humanoids2025, and the global forum on mechanical engineering.
We compare different checkpoints during the training process.
Vision-Language-Action Planning and Search (VLAPS) significantly outperforms VLA-only baselines on simulated, language-specified robotic tasks, improving success rates by up to 67 percentage points.
Vision-Language-Action Planning and Search (VLAPS) significantly outperforms VLA-only baselines on simulated, language-specified robotic tasks, improving success rates by up to 67 percentage points.
August 23, 2025 at 5:52 PM
We compare different checkpoints during the training process.
Vision-Language-Action Planning and Search (VLAPS) significantly outperforms VLA-only baselines on simulated, language-specified robotic tasks, improving success rates by up to 67 percentage points.
Vision-Language-Action Planning and Search (VLAPS) significantly outperforms VLA-only baselines on simulated, language-specified robotic tasks, improving success rates by up to 67 percentage points.
VLAs offer an avenue for generalist robot policies; however, naively following the action predictions leads to brittle or unsafe behaviours. We introduce VLAPS, which integrates model-based search with pre-trained VLA policies to improve performance without additional training.
August 23, 2025 at 5:52 PM
VLAs offer an avenue for generalist robot policies; however, naively following the action predictions leads to brittle or unsafe behaviours. We introduce VLAPS, which integrates model-based search with pre-trained VLA policies to improve performance without additional training.
My lab at @montrealrobotics.bsky.social was honoured to present our recent work to @mark-carney.bsky.social and Even Solomon explaining how AI enables new robotics that will drive innovation in Canada. It was a pleasure getting into the details with a quick dive into deterministic policy gradients!
August 20, 2025 at 10:59 PM
My lab at @montrealrobotics.bsky.social was honoured to present our recent work to @mark-carney.bsky.social and Even Solomon explaining how AI enables new robotics that will drive innovation in Canada. It was a pleasure getting into the details with a quick dive into deterministic policy gradients!
Another fantastic Montreal Robotics Summer School! Thanks to our sponsors, organizers, and @mila-quebec.bsky.social, we doubled in size this year. Congratulations again to all the students who make this school happen, and for your progress in machine learning and robotics.
August 17, 2025 at 2:23 PM
Another fantastic Montreal Robotics Summer School! Thanks to our sponsors, organizers, and @mila-quebec.bsky.social, we doubled in size this year. Congratulations again to all the students who make this school happen, and for your progress in machine learning and robotics.
The team is already growing
August 8, 2025 at 5:03 PM
The team is already growing
@rl-conference.bsky.social will be Montréal next year @umontreal-en.bsky.social!
August 7, 2025 at 2:06 AM
@rl-conference.bsky.social will be Montréal next year @umontreal-en.bsky.social!
Last, rliable has a measure of optimality gap between expert and learned policy. But, a poor gap aliases the exploration and exploitation issues. Our new measure better measures the exploitation issues and indicates that PPO is the better algorithm compared to DQN.
August 5, 2025 at 3:10 AM
Last, rliable has a measure of optimality gap between expert and learned policy. But, a poor gap aliases the exploration and exploitation issues. Our new measure better measures the exploitation issues and indicates that PPO is the better algorithm compared to DQN.
Scaling issues could be the result of narrow exploration from complex distributions or optimization issues. This method estimates that the difference is large, indicating a larger exploitation issues with larger models.
August 5, 2025 at 3:10 AM
Scaling issues could be the result of narrow exploration from complex distributions or optimization issues. This method estimates that the difference is large, indicating a larger exploitation issues with larger models.
Intrinsic rewards, which are designed to help RL algorithms explore, actually increase the difference agrivating exploitation issues. This is troublesome because as we develop new exploration methods, they may be generating better experience, but the optimization may ignore it.
August 5, 2025 at 3:10 AM
Intrinsic rewards, which are designed to help RL algorithms explore, actually increase the difference agrivating exploitation issues. This is troublesome because as we develop new exploration methods, they may be generating better experience, but the optimization may ignore it.
DQN and PPO only perform half as well as the best experience they generate across a number of environment. The difference is particularly apparent on difficult environments.
August 5, 2025 at 3:10 AM
DQN and PPO only perform half as well as the best experience they generate across a number of environment. The difference is particularly apparent on difficult environments.
After the LLM news using RL many are wondering whether progress in exploration or exploitation is needed to improve deep RL algorithms. This work introduces a new practical sub-optimality measure to understand how good an RL algorithm is at exploiting its experience.
August 5, 2025 at 3:10 AM
After the LLM news using RL many are wondering whether progress in exploration or exploitation is needed to improve deep RL algorithms. This work introduces a new practical sub-optimality measure to understand how good an RL algorithm is at exploiting its experience.
I have been cooking some code for training large generalist robotics policies that is almost ready for sharing! I will be presenting a tutorial on the code in a few weeks at an #IVADO LLM/VLM agents boot camp. Come checkout the most agentic system with full robotics control.
ivado.ca/en/events/bo...
ivado.ca/en/events/bo...
July 29, 2025 at 12:38 AM
I have been cooking some code for training large generalist robotics policies that is almost ready for sharing! I will be presenting a tutorial on the code in a few weeks at an #IVADO LLM/VLM agents boot camp. Come checkout the most agentic system with full robotics control.
ivado.ca/en/events/bo...
ivado.ca/en/events/bo...
Overall, our method obtains competitive results on stitching tasks from OGBench compared to other representation learning objectives. 5/6
June 21, 2025 at 2:32 PM
Overall, our method obtains competitive results on stitching tasks from OGBench compared to other representation learning objectives. 5/6
We can highlight the generalization gap as we try to reach more distant goals requiring combinatorial generalization > 4 (red line). While all methods have reduced success rate as goals become more OOD, better policy representations from BYOL-γ strives to close the gap. 4/6
June 21, 2025 at 2:32 PM
We can highlight the generalization gap as we try to reach more distant goals requiring combinatorial generalization > 4 (red line). While all methods have reduced success rate as goals become more OOD, better policy representations from BYOL-γ strives to close the gap. 4/6
BYOL-γ predicts future states sampled geometrically that lead to a correspondence towards approximating the successor representation in finite MDPs, and to representations that better facilitate policy generalization when used as an auxiliary loss. 2/6
June 21, 2025 at 2:32 PM
BYOL-γ predicts future states sampled geometrically that lead to a correspondence towards approximating the successor representation in finite MDPs, and to representations that better facilitate policy generalization when used as an auxiliary loss. 2/6
How can we make behavioural cloning (BC) achieve better combinatorial generalization on out-of-distribution goals?
We propose BYOL-γ: an auxiliary self-predictive loss to improve generalization for goal-conditioned BC. 🧵1/6
We propose BYOL-γ: an auxiliary self-predictive loss to improve generalization for goal-conditioned BC. 🧵1/6
June 21, 2025 at 2:32 PM
How can we make behavioural cloning (BC) achieve better combinatorial generalization on out-of-distribution goals?
We propose BYOL-γ: an auxiliary self-predictive loss to improve generalization for goal-conditioned BC. 🧵1/6
We propose BYOL-γ: an auxiliary self-predictive loss to improve generalization for goal-conditioned BC. 🧵1/6
Great dialogue between Michael Littman and Kate Hartley to provide an overview of how RL, AGI and imitation learning have arrived where they are and the ingredients to make "AGI". @rldmdublin2025.bsky.social
June 13, 2025 at 4:11 PM
Great dialogue between Michael Littman and Kate Hartley to provide an overview of how RL, AGI and imitation learning have arrived where they are and the ingredients to make "AGI". @rldmdublin2025.bsky.social
In my last lecture on large-scale #robotlearning, I cover one of my most interesting directions, generalization across sequences and robots. This generalization across sequences of actions or states is a challenging, data-intensive process that requires experience across various robots and tasks.
May 28, 2025 at 1:35 PM
In my last lecture on large-scale #robotlearning, I cover one of my most interesting directions, generalization across sequences and robots. This generalization across sequences of actions or states is a challenging, data-intensive process that requires experience across various robots and tasks.
Tomorrow @iclr-conf.bsky.social we will present a method (SFM) for jointly learning state features and matching successor features, enabling strong imitation without action labels or adversarial training. Find us at Hall 3 + Hall 2B #572.
April 25, 2025 at 1:15 AM
Tomorrow @iclr-conf.bsky.social we will present a method (SFM) for jointly learning state features and matching successor features, enabling strong imitation without action labels or adversarial training. Find us at Hall 3 + Hall 2B #572.
Autonomous learning agents need careful design and tooling to achieve useful levels of interaction in the real world. This lecture connects autonomous systems to recent ideas around #agenticmodels. These are key to learning from real-world interactions ( #reinforcementlearning ).
April 18, 2025 at 1:13 AM
Autonomous learning agents need careful design and tooling to achieve useful levels of interaction in the real world. This lecture connects autonomous systems to recent ideas around #agenticmodels. These are key to learning from real-world interactions ( #reinforcementlearning ).