Associate Professor at Polytechnique Montréal and Mila. 🇨🇦
academic.sologen.net
They show that the Mirror Descent algorithm is a particular way of discretization a certain geometry-aware gradient flow.
proceedings.mlr.press/v130/gunasek...
Interesting paper!
They show that the Mirror Descent algorithm is a particular way of discretization a certain geometry-aware gradient flow.
proceedings.mlr.press/v130/gunasek...
Interesting paper!
Interesting paper on Policy Gradient (PG) methods!
PG >> NPG or PG << NPG?! It depends on your estimator.
arxiv.org/abs/2110.15572
Interesting paper on Policy Gradient (PG) methods!
PG >> NPG or PG << NPG?! It depends on your estimator.
arxiv.org/abs/2110.15572
Claas defended his PhD on Friday (Learning to model what matters–Representations and world models for efficient reinforcement learning) at the University of Toronto.
Claas is my 3rd graduated PhD (🌟🌟🌟), and I'm so proud of him!
Claas defended his PhD on Friday (Learning to model what matters–Representations and world models for efficient reinforcement learning) at the University of Toronto.
Claas is my 3rd graduated PhD (🌟🌟🌟), and I'm so proud of him!
@nicolaspapernot.bsky.social for being an expert PhD committee member, the internal examiner (Vardan Papyan), the external examiner (David Evans), and the chair (George Eleftheriades) for all being very helpful.
@nicolaspapernot.bsky.social for being an expert PhD committee member, the internal examiner (Vardan Papyan), the external examiner (David Evans), and the chair (George Eleftheriades) for all being very helpful.
If we think carefully, we are (implicitly) making three claims.
#FoundationsOfReinforcementLearning #sneakpeek
If we think carefully, we are (implicitly) making three claims.
#FoundationsOfReinforcementLearning #sneakpeek
PANDAS significantly outperforms many-short jailbreaking by the introduction of:
✅Positive affirmations
❌Negative demonstrations
🎯Adaptive demo sampling
Paper: arxiv.org/abs/2502.01925
PANDAS significantly outperforms many-short jailbreaking by the introduction of:
✅Positive affirmations
❌Negative demonstrations
🎯Adaptive demo sampling
Paper: arxiv.org/abs/2502.01925
I will recruit graduate students on the algorithmic and theoretical aspects of Reinforcement Learning.
You will join Adage, @mila-quebec.bsky.social and @polymtl.bsky.social.
More info on why and how you should apply:
academic.sologen.net/2024/11/22/g...
Deadline: Dec 1st
I will recruit graduate students on the algorithmic and theoretical aspects of Reinforcement Learning.
You will join Adage, @mila-quebec.bsky.social and @polymtl.bsky.social.
More info on why and how you should apply:
academic.sologen.net/2024/11/22/g...
Deadline: Dec 1st