Lightnews — Scholar-powered news

Amir-massoud Farahmand

@sologen.bsky.social

640 followers 200 following 150 posts

Research Goal: Understanding the computational and statistical principles required to design AI/RL agents.
Associate Professor at Polytechnique Montréal and Mila. 🇨🇦
academic.sologen.net

Posts Replies Media Videos

Amir-massoud Farahmand

@sologen.bsky.social

Mirrorless Mirror Descent: A Natural Derivation of Mirror Descent (Gunasekar, Woodworth, Srebro at AISTATS, 2021)
They show that the Mirror Descent algorithm is a particular way of discretization a certain geometry-aware gradient flow.
proceedings.mlr.press/v130/gunasek...
Interesting paper!

A table showing that different discretization of the Riemannian Gradient Flow leads to different algorithm:

Geometry discretized, Objective discretized: Natural Gradient Descent
Geometry not discretized, Objective discretized: Mirror Descent
Geometry discretized, Objective not discretized: unknown (?)

October 11, 2025 at 7:35 PM

Amir-massoud Farahmand

@sologen.bsky.social

Understanding the Effect of Stochasticity in Policy Optimization (NeurIPS 2021) by Jincheng Mei, Bo Dai, Chenjun Xiao, @skiandsolve.bsky.social, Dale Schuurmans.

Interesting paper on Policy Gradient (PG) methods!
PG >> NPG or PG << NPG?! It depends on your estimator.
arxiv.org/abs/2110.15572

October 1, 2025 at 5:57 PM

Amir-massoud Farahmand

@sologen.bsky.social

What are we talking about when we talk about Dynamic Programming?

#ReinforcementLearning

August 3, 2025 at 8:14 PM

Amir-massoud Farahmand

@sologen.bsky.social

Let me introduce Dr. Claas Voelcker! @cvoelcker.bsky.social

Claas defended his PhD on Friday (Learning to model what matters–Representations and world models for efficient reinforcement learning) at the University of Toronto.
Claas is my 3rd graduated PhD (🌟🌟🌟), and I'm so proud of him!

Screenshot at the end of the PhD defence session. In the picture, there are Claas Voelcker, Igor Gilitschenski, William Cunningham, Philip Thomas, Florian Shkurti, Frank Kschischang, and Amir-massoud Farahmand.

July 28, 2025 at 5:30 PM

Amir-massoud Farahmand

@sologen.bsky.social

Also thanks to Rich Zemel for being a supportive co-supervisor,
@nicolaspapernot.bsky.social for being an expert PhD committee member, the internal examiner (Vardan Papyan), the external examiner (David Evans), and the chair (George Eleftheriades) for all being very helpful.

July 28, 2025 at 5:24 PM

Amir-massoud Farahmand

@sologen.bsky.social

While folks are discussing how well LLMs are solving IMO problems (however impressive, still sub-Bronze), I am hoping to see some of these brilliant students in my application pool in 4 years!

July 19, 2025 at 5:14 AM

Amir-massoud Farahmand

@sologen.bsky.social

What do we talk about when we talk about the Bellman Optimality Equation?

If we think carefully, we are (implicitly) making three claims.

#FoundationsOfReinforcementLearning #sneakpeek

$First, we claim that there exists a unique value function $\Vopt$ that satisfies the following equation: For any $x \in \XX$, we have \begin{align*} \Vopt(x) = \max_{a \in \AA} \left \{ r(x,a) + \gamma \int \PKernel(\dx' | x, a) \Vopt(x') \right \}. \end{align*} This claim alone, however, does not show that this $\Vopt$ is the same as $V^\piopt$. The second claim is that $\Vopt$ is indeed the same as $V^{\piopt}$, the optimal value function when $\pi$ is restricted to be within the space of stationary policies. This claim alone, however, does not preclude the possibility that we can find an ever more performant policy by going beyond the space of stationary policies. The third claim is that for discounted continuing MDPs, we can always find a stationary policy that is optimal within the space of all stationary and non-stationary policies. These three claims together show that the Bellman optimality equation reveals the recursive structure of the optimal value function $\Vopt = V^{\piopt}$. There is no policy, stationary or non-stationary, with a value function better than $\Vopt$, for the class of discounted continuing MDPs.$

July 8, 2025 at 11:07 PM

Amir-massoud Farahmand

@sologen.bsky.social

We introduce PANDAS🐼, a jailbreaking method that exploits LLMs' long-context capabilities!
PANDAS significantly outperforms many-short jailbreaking by the introduction of:
✅Positive affirmations
❌Negative demonstrations
🎯Adaptive demo sampling
Paper: arxiv.org/abs/2502.01925

This figure shows how PANDAS improves many-shot jailbreaking. The left side portrays a typical conversation, showcasing the Positive Affirmation phrases and Negative Demonstration phrases. The right side shows how the distribution of topics are adaptively changed.

February 7, 2025 at 5:08 PM

Amir-massoud Farahmand

@sologen.bsky.social

Be careful when you use ChatGPT as a math-buddy! It can BS confidently and strongly!

The discussion between me and ChatGPT on whether for an operator E, the composite operator E(I + E) is the same as E + E^2 or not. The answer is No. ChatGPT was wrong at first and insisted on being wrong for a while until guided through a simple counter-example.

December 6, 2024 at 4:11 AM

Amir-massoud Farahmand

@sologen.bsky.social

🎉Good news, everyone! 🎉
I will recruit graduate students on the algorithmic and theoretical aspects of Reinforcement Learning.
You will join Adage, @mila-quebec.bsky.social and @polymtl.bsky.social.
More info on why and how you should apply:
academic.sologen.net/2024/11/22/g...
Deadline: Dec 1st

November 29, 2024 at 3:23 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news