Lightnews — Scholar-powered news

Glen Berseth

@glenberseth.bsky.social

Surprise/empowerment/etc may be the fundamental objectives living organisms optimize, however it is very difficult to optimize these objectives. I will be giving a talk at international worlshop on #activeinference on how foundational models can help improve these methods.

October 17, 2025 at 3:13 PM

Glen Berseth

@glenberseth.bsky.social

For those interested in joining my lab, submit your application via the Mila form. This year I am particularly interested in students with skills/interests in robotics, reinforcement learning and, foundational models which will push forward the abilities of real world agents.

October 15, 2025 at 1:02 PM

Glen Berseth

@glenberseth.bsky.social

There are many ways to learn or compute a critic that can help score the performance of different actions. This is not the full story. If you want more details, go read rlhfbook.com/c/11-policy-...

October 1, 2025 at 12:49 AM

Glen Berseth

@glenberseth.bsky.social

GRPO is more like REINFORCE than PPO.
1) It does not train a critic (no need with small variance)
2) The SCORE FUNCTION (difficult to call this an advantage) is over a batch using the same initial prompt (similar to the vine sample method from TRPO)

October 1, 2025 at 12:49 AM

Glen Berseth

@glenberseth.bsky.social

On my way to South Korea for a week packed with robotics at the conference on Robot Learning, Humanoids2025, and the global forum on mechanical engineering.

September 24, 2025 at 12:23 PM

Glen Berseth

@glenberseth.bsky.social

We compare different checkpoints during the training process.
Vision-Language-Action Planning and Search (VLAPS) significantly outperforms VLA-only baselines on simulated, language-specified robotic tasks, improving success rates by up to 67 percentage points.

August 23, 2025 at 5:52 PM

Glen Berseth

@glenberseth.bsky.social

VLAs offer an avenue for generalist robot policies; however, naively following the action predictions leads to brittle or unsafe behaviours. We introduce VLAPS, which integrates model-based search with pre-trained VLA policies to improve performance without additional training.

August 23, 2025 at 5:52 PM

Glen Berseth

@glenberseth.bsky.social

My lab at @montrealrobotics.bsky.social was honoured to present our recent work to @mark-carney.bsky.social and Even Solomon explaining how AI enables new robotics that will drive innovation in Canada. It was a pleasure getting into the details with a quick dive into deterministic policy gradients!

August 20, 2025 at 10:59 PM

Glen Berseth

@glenberseth.bsky.social

Another fantastic Montreal Robotics Summer School! Thanks to our sponsors, organizers, and @mila-quebec.bsky.social, we doubled in size this year. Congratulations again to all the students who make this school happen, and for your progress in machine learning and robotics.

August 17, 2025 at 2:23 PM

Glen Berseth

@glenberseth.bsky.social

The team is already growing

August 8, 2025 at 5:03 PM

Glen Berseth

@glenberseth.bsky.social

@rl-conference.bsky.social will be Montréal next year @umontreal-en.bsky.social!

August 7, 2025 at 2:06 AM

Glen Berseth

@glenberseth.bsky.social

Last, rliable has a measure of optimality gap between expert and learned policy. But, a poor gap aliases the exploration and exploitation issues. Our new measure better measures the exploitation issues and indicates that PPO is the better algorithm compared to DQN.

August 5, 2025 at 3:10 AM

Glen Berseth

@glenberseth.bsky.social

Scaling issues could be the result of narrow exploration from complex distributions or optimization issues. This method estimates that the difference is large, indicating a larger exploitation issues with larger models.

August 5, 2025 at 3:10 AM

Glen Berseth

@glenberseth.bsky.social

Intrinsic rewards, which are designed to help RL algorithms explore, actually increase the difference agrivating exploitation issues. This is troublesome because as we develop new exploration methods, they may be generating better experience, but the optimization may ignore it.

August 5, 2025 at 3:10 AM

Glen Berseth

@glenberseth.bsky.social

DQN and PPO only perform half as well as the best experience they generate across a number of environment. The difference is particularly apparent on difficult environments.

August 5, 2025 at 3:10 AM

Glen Berseth

@glenberseth.bsky.social

After the LLM news using RL many are wondering whether progress in exploration or exploitation is needed to improve deep RL algorithms. This work introduces a new practical sub-optimality measure to understand how good an RL algorithm is at exploiting its experience.

August 5, 2025 at 3:10 AM

Glen Berseth

@glenberseth.bsky.social

I have been cooking some code for training large generalist robotics policies that is almost ready for sharing! I will be presenting a tutorial on the code in a few weeks at an #IVADO LLM/VLM agents boot camp. Come checkout the most agentic system with full robotics control.
ivado.ca/en/events/bo...

July 29, 2025 at 12:38 AM

Glen Berseth

@glenberseth.bsky.social

Overall, our method obtains competitive results on stitching tasks from OGBench compared to other representation learning objectives. 5/6

June 21, 2025 at 2:32 PM

Glen Berseth

@glenberseth.bsky.social

We can highlight the generalization gap as we try to reach more distant goals requiring combinatorial generalization > 4 (red line). While all methods have reduced success rate as goals become more OOD, better policy representations from BYOL-γ strives to close the gap. 4/6

June 21, 2025 at 2:32 PM

Glen Berseth

@glenberseth.bsky.social

BYOL-γ predicts future states sampled geometrically that lead to a correspondence towards approximating the successor representation in finite MDPs, and to representations that better facilitate policy generalization when used as an auxiliary loss. 2/6

June 21, 2025 at 2:32 PM

Glen Berseth

@glenberseth.bsky.social

How can we make behavioural cloning (BC) achieve better combinatorial generalization on out-of-distribution goals?

We propose BYOL-γ: an auxiliary self-predictive loss to improve generalization for goal-conditioned BC. 🧵1/6

June 21, 2025 at 2:32 PM

Glen Berseth

@glenberseth.bsky.social

Great dialogue between Michael Littman and Kate Hartley to provide an overview of how RL, AGI and imitation learning have arrived where they are and the ingredients to make "AGI". @rldmdublin2025.bsky.social

June 13, 2025 at 4:11 PM

Glen Berseth

@glenberseth.bsky.social

In my last lecture on large-scale #robotlearning, I cover one of my most interesting directions, generalization across sequences and robots. This generalization across sequences of actions or states is a challenging, data-intensive process that requires experience across various robots and tasks.

May 28, 2025 at 1:35 PM

Glen Berseth

@glenberseth.bsky.social

Tomorrow @iclr-conf.bsky.social we will present a method (SFM) for jointly learning state features and matching successor features, enabling strong imitation without action labels or adversarial training. Find us at Hall 3 + Hall 2B #572.

April 25, 2025 at 1:15 AM

Glen Berseth

@glenberseth.bsky.social

Autonomous learning agents need careful design and tooling to achieve useful levels of interaction in the real world. This lecture connects autonomous systems to recent ideas around #agenticmodels. These are key to learning from real-world interactions ( #reinforcementlearning ).

April 18, 2025 at 1:13 AM

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news