Benjamin Hilton
benjamin-hilton.bsky.social
Benjamin Hilton
@benjamin-hilton.bsky.social
Alignment @AISI. Semi-informed about economics, physics and governments. views my own
As always, we'd be very excited to collaborate on further research. If you're interested in collaborating with UK AISI, you can express interest at forms.office.com/e/BFbeUeWYQ9. If you're a non-profit or academic, you can also apply for grants up to £200,000 directly at aisi.gov.uk/grants.
Microsoft Forms
forms.office.com
May 14, 2025 at 3:39 PM
Huge thanks to Marie Buhl, Jacob Pfau and @girving.bsky.social for all their work on this. Excited to get stuck in to future work!
May 8, 2025 at 5:23 PM
There are still loads of open problems.

We need to get each part of the above right – exploration guarantees and human input particularly stand out to me (optimistic about obfuscated arguments, stand by for future publications...)
May 8, 2025 at 5:23 PM
Two things that stand out for me from this paper:
– Debate gets you correctness/honesty. That's not sufficient for harmlessness, but is a great first step.
– Low-stakes alignment (where you get single errors, but not errors on average) seems (imo) totally do-able
May 8, 2025 at 5:23 PM
This is just the start. We'll be following this up shortly with:
– A safety case sketch for debate, giving a whole host more detail on the open problems.
– A series of posts (something like 1 a week) diving into various problems we'd like to see solved.

5/5
May 7, 2025 at 5:56 PM
We've included a long list of open problems we'd like people to solve – and a reminder that you can express interest in collaborating, and apply to our challenge fund for grant funding!

bsky.app/profile/benj...

4/5
Interested in getting UK AISI support to do alignment research?

Fill in our short, < 5-min form, and we'll get back on proposals within 1 week.

(Caveat: While we may reach out about AISI funding your project, filling out this form is not an application for funding.)
May 7, 2025 at 5:56 PM
The post sets out:

Why we're excited about safety cases
Why we focus (initially) on honesty
What we mean when we talk about 'asymptotic guarantees'

3/5
May 7, 2025 at 5:56 PM
You can also apply directly for funding via the AISI Challenge Fund:
www.aisi.gov.uk/grants
Grants | The AI Security Institute (AISI)
View AISI grants. The AI Security Institute is a directorate of the Department of Science, Innovation, and Technology that facilitates rigorous research to enable advanced AI governance.
www.aisi.gov.uk
April 16, 2025 at 4:39 PM
Identifying people to work with is the biggest bottleneck for the UK AISI alignment team right now. Help out by filling in or sharing the form below:
forms.office.com/e/BFbeUeWYQ9
Microsoft Forms
forms.office.com
April 16, 2025 at 4:39 PM
We’re particularly excited to hear from:
– ML researchers
– Complexity theorists
– Game theorists
– Cognitive scientists
– People who could build datasets
– People who could run human studies
– Anyone else who thinks they might be doing, or could be doing, relevant work
April 16, 2025 at 4:39 PM
We’re trying to massively scale up the total global effort going into security-relevant alignment research, to prevent superhuman AI from posing critical risk.

We do this by:
1. Identify key alignment subproblems
2. Identifying people who can solve them
3. Funding research
April 16, 2025 at 4:39 PM