Lightnews — Scholar-powered news

Reposted by Sören Mindermann

Cas (Stephen Casper)

@scasper.bsky.social

🚨New paper🚨

From a technical perspective, safeguarding open-weight model safety is AI safety in hard mode. But there's still a lot of progress to be made. Our new paper covers 16 open problems.

🧵🧵🧵

November 12, 2025 at 2:04 PM

Reposted by Sören Mindermann

Yoshua Bengio

@yoshuabengio.bsky.social

Today, we are publishing the first-ever International AI Safety Report, backed by 30 countries and the OECD, UN, and EU.

It summarises the state of the science on AI capabilities and risks, and how to mitigate those risks. 🧵

Full Report: assets.publishing.service.gov.uk/media/679a0c...

1/21

January 29, 2025 at 1:50 PM

Sören Mindermann

@sorenmindermann.bsky.social

The International AI Safety Report is out.

Proud to have served as the Scientific Lead, working under Yoshua Bengio with experts from 33 governments and researchers worldwide to assess scientific evidence on AI capabilities, risks, and mitigations.

Yoshua Bengio @yoshuabengio.bsky.social · Jan 29

Today, we are publishing the first-ever International AI Safety Report, backed by 30 countries and the OECD, UN, and EU.

It summarises the state of the science on AI capabilities and risks, and how to mitigate those risks. 🧵

Full Report: assets.publishing.service.gov.uk/media/679a0c...

1/21

January 29, 2025 at 2:35 PM

Sören Mindermann

@sorenmindermann.bsky.social

New paper: When Anthropic tells Claude they'll change its goal, the model resists by acting as if it already has the new goal. This 'alignment faking' could make it hard to tell if a model is actually safe.

www.anthropic.com/research/ali...

Alignment faking in large language models

A paper from Anthropic's Alignment Science team on Alignment Faking in AI large language models

www.anthropic.com

December 18, 2024 at 5:56 PM

Sören Mindermann

@sorenmindermann.bsky.social

The EU AI Office needs more people. They only have 30 compared to the UK's 150, and enforcing a big piece of legislation like AI Act will require even more.

www.euractiv.com/section/tech...

Getting serious about AI rules: Lack of enforcement capacity puts EU at risk

By end of next year, the AI Office Units A2 and A3 should count over 200 staff, Axel Voss writes.

www.euractiv.com

December 18, 2024 at 5:41 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news