vkrakovna.bsky.social
@vkrakovna.bsky.social
Research scientist in AI alignment at Google DeepMind. Co-founder of Future of Life Institute. Views are my own and do not represent GDM or FLI.
As models advance, a key AI safety concern is deceptive alignment / "scheming" – where AI might covertly pursue unintended goals. Our paper "Evaluating Frontier Models for Stealth and Situational Awareness" assesses whether current models can scheme. arxiv.org/abs/2505.01420
July 8, 2025 at 12:11 PM
Reposted
Super excited this giant paper outlining our technical approach to AGI safety and security is finally out!

No time to read 145 pages? Check out the 10 page extended abstract at the beginning of the paper
April 4, 2025 at 2:27 PM
We are excited to release a short course on AGI safety! The course offers a concise and accessible introduction to AI alignment problems and our technical / governance approaches, consisting of short recorded talks and exercises (75 minutes total). deepmindsafetyresearch.medium.com/1072adb7912c
Introducing our short course on AGI safety
We are excited to release a short course on AGI safety for students, researchers and professionals interested in this topic. The course…
deepmindsafetyresearch.medium.com
February 14, 2025 at 4:06 PM