Rishub Jain
shubadubadub.bsky.social
Rishub Jain
@shubadubadub.bsky.social
Works at Google DeepMind on Safe+Ethical AI
Reposted by Rishub Jain
New Google DeepMind safety paper! LLM agents are coming – how do we stop them finding complex plans to hack the reward?

Our method, MONA, prevents many such hacks, *even if* humans are unable to detect them!

Inspired by myopic optimization but better performance – details in🧵
January 23, 2025 at 3:33 PM
How do we ensure humans can still effectively oversee increasingly powerful AI systems? In our blog, we argue that achieving Human-AI complementarity is an underexplored yet vital piece of this puzzle! And, it’s hard, but we achieved it.

🧵(1/10)
December 24, 2024 at 12:01 AM
Reposted by Rishub Jain
Can someone let me into Croatia’s inside joke
May 13, 2023 at 9:14 PM