Geoffrey Irving
banner
girving.bsky.social
Geoffrey Irving
@girving.bsky.social
Chief Scientist at the UK AI Security Institute (AISI). Previously DeepMind, OpenAI, Google Brain, etc.
The UK AI Security Institute ran an Alignment Conference from 29-31 November in London! The goal was to gather a mix of people experienced in and new to alignment, and get into the details of novel approaches to alignment and related problems. Hopefully we helped create some new research bets! 🧵
November 13, 2025 at 5:00 PM
Another strong transition from @matt-levine.bsky.social.
October 23, 2025 at 7:59 PM
New open source library from the UK AI Security Institute! ControlArena lowers the barrier to secure and reproducible AI control research, to boost work on blocking and detecting malicious actions in case AI models are misaligned. In use by researchers at GDM, Anthropic, Redwood, and MATS! 🧵
October 22, 2025 at 6:04 PM
Ominous start to a Wikipedia page about a formula...

en.wikipedia.org/wiki/Fa%C3%A...
September 29, 2025 at 9:02 PM
From near the end of Sleepwalkers, by Christopher Clark, as World War I starts.
August 23, 2025 at 3:40 PM
Short note on relativisation in debate protocols: to model AI training protocols, we need results that hold even if our source of truth (humans for instance) is a black box that can't be introspected. With @benjamin-hilton.bsky.social and Simon Marshall. 🧵

www.alignmentforum.org/posts/XycoFu...
June 26, 2025 at 4:46 PM
New alignment theory paper! We present a new scalable oversight protocol (prover-estimator debate) and a proof that honesty is incentivised at equilibrium (with large assumptions, see 🧵), even when the AIs involved have similar available compute.
June 17, 2025 at 4:52 PM
Going back through old blog posts, and I still love these old cloth collision event visualizations.

naml.us/post/visualizi…
May 31, 2025 at 2:18 PM
AISI's research agenda is out! We cover a variety of topics in the evaluation and mitigations of risks from frontier LLMs, including both work happening at AISI and work we are excited to see others tackle.

www.aisi.gov.uk/research-age...
May 6, 2025 at 10:55 AM
A new gem I just discovered: how to paste an image on top of a pdf in Preview. :)

apple.stackexchange.com/questions/37...
May 6, 2025 at 9:11 AM
Such a difference could be super subtle. Models seem to able to make impressive inferences from just texture, such as in this image Scott Alexander tried: astralcodexten.com/p/testing-ai...
May 2, 2025 at 12:02 PM
The LLM geoguesser discussions remind me of the trapdoor technique Jonah Brown-Cohen and I were tinkering with for alignment purposes. Say you want to whether a model can do a task. How could you know this, without being able to verify individual answers? 🧵

x.com/KelseyTuoc/s...
May 2, 2025 at 12:02 PM
Reading my electricity meter seems to have held up as a hard benchmark for VLMs. Here’s o3 thinking for ~6 minutes and arriving at the wrong answer. (o4-mini also gets it wrong.)
April 27, 2025 at 9:46 AM
Please apply if interested!

t.co/AqlwmxvVdH
April 16, 2025 at 4:44 PM
The most beautiful equation in mathematics.
April 16, 2025 at 4:17 AM
February 12, 2025 at 9:27 PM
Once we're in the two element free-group, the paradoxical decomposition can be visualised directly: there's nothing fundamentally infinite about it. Wikipedia has a nice picture and discussion.

en.wikipedia.org/wiki/Banach%...
December 7, 2024 at 12:49 PM
Soon:
December 2, 2024 at 10:54 PM
It gets worse: in 2020 it was a *four* word phrase.

en.m.wikipedia.org/wiki/Word_of...
December 2, 2024 at 8:01 PM
November 28, 2024 at 7:36 PM
Böttcher coordinates map from outside the Mandelbrot set to outside the disk. You can make an animation by zooming, since the coordinates are the identity near infinity.

I made it before I knew the math well, and it's much more satisfying to watch now that it's formalised in github.com/girving/ray.
November 26, 2024 at 8:55 PM
Until then...
November 24, 2024 at 1:02 AM
Hmm, that was supposed to be an animated gif, but it doesn't animate. Here's a (sadly non-looping) video version.
November 24, 2024 at 1:00 AM
Hmm, does this work? For me it plays as an animated gif as I draft the post, and then doesn't look animated once posted.
November 24, 2024 at 12:48 AM
November 24, 2024 at 12:22 AM