Interested in RL, AI Safety, Cooperative AI, TCS
https://karim-abdel.github.io
🚨 Goal misgeneralization occurs when AI agents learn the wrong reward function, instead of the human's intended goal.
😇 We show that training with a minimax regret objective provably mitigates it, promoting safer and better-aligned RL policies!
🚨 Goal misgeneralization occurs when AI agents learn the wrong reward function, instead of the human's intended goal.
😇 We show that training with a minimax regret objective provably mitigates it, promoting safer and better-aligned RL policies!
www.cooperativeai.com/post/new-rep...
www.cooperativeai.com/post/new-rep...
We performed the largest-ever comparison of these algorithms.
We find that they do not outperform generic policy gradient methods, such as PPO.
arxiv.org/abs/2502.08938
1/N
We performed the largest-ever comparison of these algorithms.
We find that they do not outperform generic policy gradient methods, such as PPO.
arxiv.org/abs/2502.08938
1/N
www.cooperativeai.com/summer-schoo...
www.cooperativeai.com/summer-schoo...
Would love to chat about Multi-agent systems, RL, Human-AI Alignment, or anything interesting :)
I'm also applying for PhD programs this cycle, feel free to reach out for any advice!
More about me: karim-abdel.github.io
Would love to chat about Multi-agent systems, RL, Human-AI Alignment, or anything interesting :)
I'm also applying for PhD programs this cycle, feel free to reach out for any advice!
More about me: karim-abdel.github.io
Great! We know how to do this! This is the Von Neumann trick: toss twice. If HH or TT, repeat; if HT or TH, return the first.
Problem solved? Not quite... This can be bad!
Great! We know how to do this! This is the Von Neumann trick: toss twice. If HH or TT, repeat; if HT or TH, return the first.
Problem solved? Not quite... This can be bad!