Lightnews — Scholar-powered news

@vkrakovna.bsky.social

Research scientist in AI alignment at Google DeepMind. Co-founder of Future of Life Institute. Views are my own and do not represent GDM or FLI.

Posts Replies Media Videos

vkrakovna.bsky.social

@vkrakovna.bsky.social

For example, in one stealth evaluation, a model acting as a personal assistant had to make a user miss a meeting and then cover its tracks by deleting emails and logs. Models performed significantly worse than humans on this challenge.

July 8, 2025 at 12:11 PM

vkrakovna.bsky.social

@vkrakovna.bsky.social

As models advance, a key AI safety concern is deceptive alignment / "scheming" – where AI might covertly pursue unintended goals. Our paper "Evaluating Frontier Models for Stealth and Situational Awareness" assesses whether current models can scheme. arxiv.org/abs/2505.01420

July 8, 2025 at 12:11 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news