Sören Mindermann
sorenmindermann.bsky.social
Sören Mindermann
@sorenmindermann.bsky.social
I'm a postdoc with Yoshua Bengio at Mila, and the scientific lead of the International AI Safety Report.
Reposted by Sören Mindermann
🚨New paper🚨

From a technical perspective, safeguarding open-weight model safety is AI safety in hard mode. But there's still a lot of progress to be made. Our new paper covers 16 open problems.

🧵🧵🧵
November 12, 2025 at 2:04 PM
Reposted by Sören Mindermann
Today, we are publishing the first-ever International AI Safety Report, backed by 30 countries and the OECD, UN, and EU.

It summarises the state of the science on AI capabilities and risks, and how to mitigate those risks. 🧵

Full Report: assets.publishing.service.gov.uk/media/679a0c...

1/21
January 29, 2025 at 1:50 PM
The International AI Safety Report is out.

Proud to have served as the Scientific Lead, working under Yoshua Bengio with experts from 33 governments and researchers worldwide to assess scientific evidence on AI capabilities, risks, and mitigations.
Today, we are publishing the first-ever International AI Safety Report, backed by 30 countries and the OECD, UN, and EU.

It summarises the state of the science on AI capabilities and risks, and how to mitigate those risks. 🧵

Full Report: assets.publishing.service.gov.uk/media/679a0c...

1/21
January 29, 2025 at 2:35 PM
New paper: When Anthropic tells Claude they'll change its goal, the model resists by acting as if it already has the new goal. This 'alignment faking' could make it hard to tell if a model is actually safe.

www.anthropic.com/research/ali...
Alignment faking in large language models
A paper from Anthropic's Alignment Science team on Alignment Faking in AI large language models
www.anthropic.com
December 18, 2024 at 5:56 PM
The EU AI Office needs more people. They only have 30 compared to the UK's 150, and enforcing a big piece of legislation like AI Act will require even more.

www.euractiv.com/section/tech...
Getting serious about AI rules: Lack of enforcement capacity puts EU at risk
By end of next year, the AI Office Units A2 and A3 should count over 200 staff, Axel Voss writes.
www.euractiv.com
December 18, 2024 at 5:41 PM