Lightnews — Scholar-powered news

Reposted by Benjamin Hilton

Geoffrey Irving

@girving.bsky.social

I am very excited that AISI is announcing over £15M in funding for AI alignment and control, in partnership with other governments, industry, VCs, and philanthropists!

Here is a 🧵 about why it is important to bring more independent ideas and expertise into this space.

alignmentproject.aisi.gov.uk

The Alignment Project by AISI — The AI Security Institute

The Alignment Project funds groundbreaking AI alignment research to address one of AI’s most urgent challenges: ensuring advanced systems act predictably, safely, and for society’s benefit.

alignmentproject.aisi.gov.uk

July 30, 2025 at 11:53 AM

Benjamin Hilton

@benjamin-hilton.bsky.social

Humans are often very wrong.

This is a big problem if you want to use human judgment to oversee super-smart AI systems.

In our new post, @girving.bsky.social argues that we might be able to deal with this issue – not by fixing the humans, but by redesigning oversight protocols.

May 14, 2025 at 3:39 PM

Benjamin Hilton

@benjamin-hilton.bsky.social

Want to build an aligned ASI? Our new paper explains how to do that, using debate.

Tl;dr:

Debate + exploration guarantees + no obfuscated arguments + good human input = outer alignment

Outer alignment + online training = inner alignment*

* sufficient for low-stakes contexts

May 8, 2025 at 5:23 PM

Reposted by Benjamin Hilton

Geoffrey Irving

@girving.bsky.social

On top of the AISI-wide research agenda yesterday, we have more on the research agenda for the AISI Alignment Team specifically. See Benjamin's thread and full post for details; here I'll focus on why we should not give up on directly solving alignment, even though it is hard. 🧵

Benjamin Hilton @benjamin-hilton.bsky.social · May 7

The Alignment Team at UK AISI now has a research agenda.

Our goal: solve the alignment problem.
How: develop concrete, parallelisable open problems.

Our initial focus is on asymptotic honesty guarantees (more details in the post).

1/5

May 8, 2025 at 9:15 AM

Benjamin Hilton

@benjamin-hilton.bsky.social

The Alignment Team at UK AISI now has a research agenda.

Our goal: solve the alignment problem.
How: develop concrete, parallelisable open problems.

Our initial focus is on asymptotic honesty guarantees (more details in the post).

1/5

May 7, 2025 at 5:56 PM

Benjamin Hilton

@benjamin-hilton.bsky.social

Interested in getting UK AISI support to do alignment research?

Fill in our short, < 5-min form, and we'll get back on proposals within 1 week.

(Caveat: While we may reach out about AISI funding your project, filling out this form is not an application for funding.)

April 16, 2025 at 4:39 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news