the future is agents in generative environments
🚨 Goal misgeneralization occurs when AI agents learn the wrong reward function, instead of the human's intended goal.
😇 We show that training with a minimax regret objective provably mitigates it, promoting safer and better-aligned RL policies!
🚨 Goal misgeneralization occurs when AI agents learn the wrong reward function, instead of the human's intended goal.
😇 We show that training with a minimax regret objective provably mitigates it, promoting safer and better-aligned RL policies!
could one be inspiration for the environment design as a general sum game or a coalition game, going beyond zero-sum stuff?
www.youtube.com/watch?v=Npfo...
cc @michaelddennis.bsky.social
could one be inspiration for the environment design as a general sum game or a coalition game, going beyond zero-sum stuff?
www.youtube.com/watch?v=Npfo...
cc @michaelddennis.bsky.social
arxiv.org/abs/2502.096...
arxiv.org/abs/2502.096...
A theory of appropriateness with applications to generative artificial intelligence
arxiv.org/abs/2412.19010
And happy new year everyone!
A theory of appropriateness with applications to generative artificial intelligence
arxiv.org/abs/2412.19010
And happy new year everyone!
Genie 1 showed it's possible. 9 months later, Genie 2 shows jaw-dropping progress.🤯 Witness the magic of scale, again. 📈🚀 Thx to all team members @deep-mind.bsky.social!
Genie 1 showed it's possible. 9 months later, Genie 2 shows jaw-dropping progress.🤯 Witness the magic of scale, again. 📈🚀 Thx to all team members @deep-mind.bsky.social!
the future is agents in generative environments
the future is agents in generative environments
I think you may find one or two people who share your sentiment here... 😁
I think you may find one or two people who share your sentiment here... 😁