Interested in RL, AI Safety, Cooperative AI, TCS
https://karim-abdel.github.io
Joint work with the great Matthew Farrugia-Roberts, Usman Anwar, Hannah Erlebach, Christrian Schroeder de Witt, David Krueger and @michaelddennis.bsky.social
www.arxiv.org/pdf/2507.03068
Joint work with the great Matthew Farrugia-Roberts, Usman Anwar, Hannah Erlebach, Christrian Schroeder de Witt, David Krueger and @michaelddennis.bsky.social
www.arxiv.org/pdf/2507.03068
• Improving UED algorithms to be closer to the results predicted by our theory
• Mitigating the fully ambiguous case, by focusing on the inductive biases of the agent.
• Improving UED algorithms to be closer to the results predicted by our theory
• Mitigating the fully ambiguous case, by focusing on the inductive biases of the agent.
The results show that agents trained with the regret objective achieve near-maximum return for almost all goal locations.
The results show that agents trained with the regret objective achieve near-maximum return for almost all goal locations.
Left: performance at test time
Right: % of distinguishing levels played by the respective level designer
Left: performance at test time
Right: % of distinguishing levels played by the respective level designer
Adding a few distinguishing levels does not alter this outcome. However, we propose a mitigation for this scenario!
Adding a few distinguishing levels does not alter this outcome. However, we propose a mitigation for this scenario!
• Non-distinguishing: the true and proxy reward may induce the same behaviour
• Distinguishing: the true and proxy rewards induce different behavior
• Non-distinguishing: the true and proxy reward may induce the same behaviour
• Distinguishing: the true and proxy rewards induce different behavior
Minimizing it will encourage the agent to solve rare out-of-distribution levels during training, helping it learn the correct reward function.
Minimizing it will encourage the agent to solve rare out-of-distribution levels during training, helping it learn the correct reward function.