Lightnews — Scholar-powered news

@vkrakovna.bsky.social

Research scientist in AI alignment at Google DeepMind. Co-founder of Future of Life Institute. Views are my own and do not represent GDM or FLI.

Posts Replies Media Videos

vkrakovna.bsky.social

@vkrakovna.bsky.social

This allows us to make an "inability safety case" for now: current models are unlikely to cause severe harm via scheming. As model capabilities advance, we will need to continue evaluating models for scheming capabilities.
Read more about this work: deepmindsafetyresearch.medium.com/evaluating-a...

Evaluating and monitoring for AI scheming

By Victoria Krakovna, Scott Emmons, Erik Jenner, Mary Phuong, Lewis Ho, and Rohin Shah

deepmindsafetyresearch.medium.com

July 8, 2025 at 12:11 PM

vkrakovna.bsky.social

@vkrakovna.bsky.social

For example, in one stealth evaluation, a model acting as a personal assistant had to make a user miss a meeting and then cover its tracks by deleting emails and logs. Models performed significantly worse than humans on this challenge.

July 8, 2025 at 12:11 PM

vkrakovna.bsky.social

@vkrakovna.bsky.social

We tested Gemini 2.5 Flash & Pro, GPT-4o, o1, and Claude 3.7 Sonnet using a new suite of evaluations for scheming capabilities. These models show some limited stealth and situational awareness capabilities. Our evaluations are open-sourced here: github.com/UKGovernment...

inspect_evals/src/inspect_evals/gdm_capabilities at main · UKGovernmentBEIS/inspect_evals

Collection of evals for Inspect AI. Contribute to UKGovernmentBEIS/inspect_evals development by creating an account on GitHub.

github.com

July 8, 2025 at 12:11 PM

vkrakovna.bsky.social

@vkrakovna.bsky.social

We identified two prerequisite capabilities for scheming:
- Stealth: Ability to reason about and circumvent oversight mechanisms.
- Situational awareness: The model’s ability to gain and leverage knowledge about itself and its circumstances and environment to pursue goals.

July 8, 2025 at 12:11 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news