Amir-massoud Farahmand
sologen.bsky.social
Amir-massoud Farahmand
@sologen.bsky.social
Research Goal: Understanding the computational and statistical principles required to design AI/RL agents.
Associate Professor at Polytechnique Montréal and Mila. 🇨🇦
academic.sologen.net
Mirrorless Mirror Descent: A Natural Derivation of Mirror Descent (Gunasekar, Woodworth, Srebro at AISTATS, 2021)
They show that the Mirror Descent algorithm is a particular way of discretization a certain geometry-aware gradient flow.
proceedings.mlr.press/v130/gunasek...
Interesting paper!
October 11, 2025 at 7:35 PM
Understanding the Effect of Stochasticity in Policy Optimization (NeurIPS 2021) by Jincheng Mei, Bo Dai, Chenjun Xiao, @skiandsolve.bsky.social, Dale Schuurmans.

Interesting paper on Policy Gradient (PG) methods!
PG >> NPG or PG << NPG?! It depends on your estimator.
arxiv.org/abs/2110.15572
October 1, 2025 at 5:57 PM
What are we talking about when we talk about Dynamic Programming?

#ReinforcementLearning
August 3, 2025 at 8:14 PM
Let me introduce Dr. Claas Voelcker! @cvoelcker.bsky.social

Claas defended his PhD on Friday (Learning to model what matters–Representations and world models for efficient reinforcement learning) at the University of Toronto.
Claas is my 3rd graduated PhD (🌟🌟🌟), and I'm so proud of him!
July 28, 2025 at 5:30 PM
Also thanks to Rich Zemel for being a supportive co-supervisor,
@nicolaspapernot.bsky.social for being an expert PhD committee member, the internal examiner (Vardan Papyan), the external examiner (David Evans), and the chair (George Eleftheriades) for all being very helpful.
July 28, 2025 at 5:24 PM
While folks are discussing how well LLMs are solving IMO problems (however impressive, still sub-Bronze), I am hoping to see some of these brilliant students in my application pool in 4 years!
July 19, 2025 at 5:14 AM
What do we talk about when we talk about the Bellman Optimality Equation?

If we think carefully, we are (implicitly) making three claims.

#FoundationsOfReinforcementLearning #sneakpeek
July 8, 2025 at 11:07 PM
We introduce PANDAS🐼, a jailbreaking method that exploits LLMs' long-context capabilities!
PANDAS significantly outperforms many-short jailbreaking by the introduction of:
✅Positive affirmations
❌Negative demonstrations
🎯Adaptive demo sampling
Paper: arxiv.org/abs/2502.01925
February 7, 2025 at 5:08 PM
Be careful when you use ChatGPT as a math-buddy! It can BS confidently and strongly!
December 6, 2024 at 4:11 AM
🎉Good news, everyone! 🎉
I will recruit graduate students on the algorithmic and theoretical aspects of Reinforcement Learning.
You will join Adage, @mila-quebec.bsky.social and @polymtl.bsky.social.
More info on why and how you should apply:
academic.sologen.net/2024/11/22/g...
Deadline: Dec 1st
November 29, 2024 at 3:23 AM