Benjamin Hilton
benjamin-hilton.bsky.social
Benjamin Hilton
@benjamin-hilton.bsky.social
Alignment @AISI. Semi-informed about economics, physics and governments. views my own
Reposted by Benjamin Hilton
I am very excited that AISI is announcing over £15M in funding for AI alignment and control, in partnership with other governments, industry, VCs, and philanthropists!

Here is a 🧵 about why it is important to bring more independent ideas and expertise into this space.

alignmentproject.aisi.gov.uk
The Alignment Project by AISI — The AI Security Institute
The Alignment Project funds groundbreaking AI alignment research to address one of AI’s most urgent challenges: ensuring advanced systems act predictably, safely, and for society’s benefit.
alignmentproject.aisi.gov.uk
July 30, 2025 at 11:53 AM
Humans are often very wrong.

This is a big problem if you want to use human judgment to oversee super-smart AI systems.

In our new post, @girving.bsky.social argues that we might be able to deal with this issue – not by fixing the humans, but by redesigning oversight protocols.
May 14, 2025 at 3:39 PM
Want to build an aligned ASI? Our new paper explains how to do that, using debate.

Tl;dr:

Debate + exploration guarantees + no obfuscated arguments + good human input = outer alignment

Outer alignment + online training = inner alignment*

* sufficient for low-stakes contexts
May 8, 2025 at 5:23 PM
Reposted by Benjamin Hilton
On top of the AISI-wide research agenda yesterday, we have more on the research agenda for the AISI Alignment Team specifically. See Benjamin's thread and full post for details; here I'll focus on why we should not give up on directly solving alignment, even though it is hard. 🧵
The Alignment Team at UK AISI now has a research agenda.

Our goal: solve the alignment problem.
How: develop concrete, parallelisable open problems.

Our initial focus is on asymptotic honesty guarantees (more details in the post).

1/5
May 8, 2025 at 9:15 AM
The Alignment Team at UK AISI now has a research agenda.

Our goal: solve the alignment problem.
How: develop concrete, parallelisable open problems.

Our initial focus is on asymptotic honesty guarantees (more details in the post).

1/5
May 7, 2025 at 5:56 PM
Interested in getting UK AISI support to do alignment research?

Fill in our short, < 5-min form, and we'll get back on proposals within 1 week.

(Caveat: While we may reach out about AISI funding your project, filling out this form is not an application for funding.)
April 16, 2025 at 4:39 PM