Lightnews — Scholar-powered news

Karim Abdel Sadek

@karimabdel.bsky.social

190 followers 94 following 16 posts

Incoming PhD, UC Berkeley

Interested in RL, AI Safety, Cooperative AI, TCS

https://karim-abdel.github.io

Posts Replies Media Videos

Karim Abdel Sadek

@karimabdel.bsky.social

The paper, "Mitigating goal misgeneralization via minimax regret" will appear at @rl-conference.bsky.social!

Joint work with the great Matthew Farrugia-Roberts, Usman Anwar, Hannah Erlebach, Christrian Schroeder de Witt, David Krueger and @michaelddennis.bsky.social

www.arxiv.org/pdf/2507.03068

www.arxiv.org

July 8, 2025 at 5:16 PM

Karim Abdel Sadek

@karimabdel.bsky.social

Future work we are excited about:

• Improving UED algorithms to be closer to the results predicted by our theory

• Mitigating the fully ambiguous case, by focusing on the inductive biases of the agent.

July 8, 2025 at 5:16 PM

Karim Abdel Sadek

@karimabdel.bsky.social

We also visualize the performance of our agents in a maze for each possible location of the goal in the environment.

The results show that agents trained with the regret objective achieve near-maximum return for almost all goal locations.

July 8, 2025 at 5:16 PM

Karim Abdel Sadek

@karimabdel.bsky.social

We complement our theoretical findings with empirical results. We find these as supporting our theory, showing better generalization of agents trained via minimax regret.

Left: performance at test time
Right: % of distinguishing levels played by the respective level designer

July 8, 2025 at 5:16 PM

Karim Abdel Sadek

@karimabdel.bsky.social

In the case where the environments in deployment are in the support of the training level distribution, we also show that a policy that is optimal with respect to the minimax regret objective must provably be robust against goal misgeneralization!

July 8, 2025 at 5:16 PM

Karim Abdel Sadek

@karimabdel.bsky.social

We first formally show that a policy maximizing expected value may suffer from goal misgeneralization if distinguishing levels are rare.

July 8, 2025 at 5:16 PM

Karim Abdel Sadek

@karimabdel.bsky.social

Goal misgeneralization can occur when training only on non-distinguishing levels, as shown in Langosco et al., 2022.

Adding a few distinguishing levels does not alter this outcome. However, we propose a mitigation for this scenario!

July 8, 2025 at 5:16 PM

Karim Abdel Sadek

@karimabdel.bsky.social

Goal misgeneralization arises due to the presence of ‘proxy goals’. We formalize this and characterize environments as either:

• Non-distinguishing: the true and proxy reward may induce the same behaviour

• Distinguishing: the true and proxy rewards induce different behavior

July 8, 2025 at 5:16 PM

Karim Abdel Sadek

@karimabdel.bsky.social

We propose using regret, the difference between the optimal agent's return and our current policy's return, as a training objective.

Minimizing it will encourage the agent to solve rare out-of-distribution levels during training, helping it learn the correct reward function.

July 8, 2025 at 5:16 PM

Karim Abdel Sadek

@karimabdel.bsky.social

what if…

February 21, 2025 at 4:31 AM

Karim Abdel Sadek

@karimabdel.bsky.social

lbh gnxr gur yninynzc bhgchg, naq Nyvpr naq Obo qb gur qbg cebqhpg bs vg jvgu gurve erfcrpgvir ahzore naq gura nccyl zbq 2 gb gur erfhyg. Gurl gura pbzzhavpngr gur ovg gurl bognvarq (1=jnir,0=jvax), naq guvf bcrengvba nyjnlf erghea gur fnzr ahzore gb obgu vs n=o be bgurejvfr snvyf jvgu c=1/2?

February 17, 2025 at 6:30 AM

Karim Abdel Sadek

@karimabdel.bsky.social

Here some cool work doing a first step towards that in Minecraft using MCTS: Scalably Solving Assistance Games - openreview.net/pdf/080f0c69...

openreview.net

November 19, 2024 at 3:26 PM

Karim Abdel Sadek

@karimabdel.bsky.social

Very cool work! I think an important challenge is to scale assistance games in scenarios where the goal/action/communication space can be 'large', as to capture real world scenarios where we will want to actually apply CIRL.

November 19, 2024 at 3:26 PM

Karim Abdel Sadek

@karimabdel.bsky.social

Here some cool work doing a first step towards that in Minecraft using MCTS: Scalably Solving Assistance Games - openreview.net/pdf/080f0c69...

openreview.net

November 19, 2024 at 3:22 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news