Marlos C. Machado
banner
marloscmachado.bsky.social
Marlos C. Machado
@marloscmachado.bsky.social
Assistant Professor at the University of Alberta. Amii Fellow, Canada CIFAR AI chair. Machine learning researcher. All things reinforcement learning.

📍 Edmonton, Canada 🇨🇦
🔗 https://webdocs.cs.ualberta.ca/~machado/

🗓️ Joined November, 2024
2/2: “Conquerors live in dread of the day when they are shown to be, not superior, but simply lucky.”

― N.K. Jemisin, The Stone Sky
August 27, 2025 at 2:20 AM
* RLC Journal to Conference Track:*
(Originally published at TMLR)

- Deep RL track (Thu): AGaLiTe: Approximate Gated Linear Transformers for Online Reinforcement Learning by S. Pramanik
August 4, 2025 at 3:49 PM
* RLC Full Papers:*
(These are great papers!)

- Deep RL track (Thu): Deep Reinforcement Learning with Gradient Eligibility Traces by E. Elelimy
- Foundations track (Fri): An Analysis of Action-Value Temporal-Difference Methods That Learn State Values by B. Daley and P. Nagarajan
August 4, 2025 at 3:49 PM
* RLC Workshop Papers (2/2):*
Inductive Biases in RL
sites.google.com/view/ibrl-wo...

- A Study of Value-Aware Eigenoptions by H. Kotamreddy
August 4, 2025 at 3:49 PM
* RLC Workshop Papers (1/2):*
RL Beyond Rewards
rlbrew2-workshop.github.io

- Tue 11:59 (spotlight talk): Towards An Option Basis To Optimize All Rewards by S. Chandrasekar
- The World Is Bigger: A Computationally-Embedded Perspective on the Big World Hypothesis by A. Lewandowsi
Workshop on Reinforcement Learning Beyond Rewards: Ingredients for Developing Generalist Agents
rlbrew2-workshop.github.io
August 4, 2025 at 3:49 PM
9/9: I genuinely think AgarCL might unlock new research avenues in CRL, including loss of plasticity, exploration, representation learning, and more. I do hope you consider using it.

Repo: github.com/machado-rese...
Website: agarcl.github.io
Preprint: arxiv.org/abs/2505.18347
May 27, 2025 at 3:48 AM
8/9: Well, if you are still interested, you should probably consider reading the paper, but it is interesting to see that most of the agents we considered were able to reach human-level performance only in the most benign settings. And we did use a lot of computing here!
May 27, 2025 at 3:48 AM
7/9: Through mini-games, we tried to quantify and isolate some of the challenges AgarCL poses, including partial observability, non-stationarity, exploration, hyperparameter tuning, and the non-episodic nature of the environment (so easy to forget!). Where do our agents "break"?
May 27, 2025 at 3:48 AM
6/9: Importantly, this is a challenge problem that forces us to deal with many problems we often avoid, such as hyperparameter sweeps and exploration in CRL.

It is perhaps no surprise that the classic algorithms we considered couldn't really make much progress in the full game.
May 27, 2025 at 3:48 AM
5/9: Over time, even the agent's observation will change, as the camera needs to zoom out to accommodate more agents; not to mention that there are other agents in the environment. I'm very excited about AgarCL because I think it allows us to ask questions we couldn't before.
May 27, 2025 at 3:48 AM
4/9: AgarCL is an adaptation of agar.io, a game with simple mechanics that lead to complex interactions. It's non-episodic, and a key aspect is that the agent dynamics change as it accumulates mass: It becomes slower, gains new affordances, sheds more mass, etc.
May 27, 2025 at 3:48 AM
3/9: AgarCL is our attempt at an environment with the complexity of a "big world" but in a smooth way, where the "laws of physics" don't change. It has complex dynamics, is partially observable, with non-stationarity, pixel-based observations, and a hybrid action space.
May 27, 2025 at 3:48 AM
2/9: CRL is often motivated by the idea that the world is bigger than the agent, requiring tracking. We usually simulate this with non-stationarity by cycling through classic episodic problems. I've written papers like this, but it feels too artificial.

arxiv.org/abs/2303.07507
May 27, 2025 at 3:48 AM
This is great, thanks for sharing! We will read your paper carefully.
May 25, 2025 at 5:58 PM
7/7: We just scratched the surface here, but I think this could be the beginning of something interesting; that might be relevant to research questions ranging from safety in RL all the way to cognitive sciences.

Again, here's the preprint by Tse et al.: arxiv.org/abs/2505.16217
Reward-Aware Proto-Representations in Reinforcement Learning
In recent years, the successor representation (SR) has attracted increasing attention in reinforcement learning (RL), and it has been used to address some of its key challenges, such as exploration, c...
arxiv.org
May 24, 2025 at 3:23 PM
6/7: We also show that, when compared to the SR, the DR gives rise to qualitatively different behavior in all sorts of tasks, such as reward shaping, exploration, & option discovery. Similar to what we did w/ STOMP, sometimes there's value in being aware of the reward function 😁
May 24, 2025 at 3:23 PM