Here is a 🧵 about why it is important to bring more independent ideas and expertise into this space.
alignmentproject.aisi.gov.uk
Here is a 🧵 about why it is important to bring more independent ideas and expertise into this space.
alignmentproject.aisi.gov.uk
This is a big problem if you want to use human judgment to oversee super-smart AI systems.
In our new post, @girving.bsky.social argues that we might be able to deal with this issue – not by fixing the humans, but by redesigning oversight protocols.
This is a big problem if you want to use human judgment to oversee super-smart AI systems.
In our new post, @girving.bsky.social argues that we might be able to deal with this issue – not by fixing the humans, but by redesigning oversight protocols.
Tl;dr:
Debate + exploration guarantees + no obfuscated arguments + good human input = outer alignment
Outer alignment + online training = inner alignment*
* sufficient for low-stakes contexts
Tl;dr:
Debate + exploration guarantees + no obfuscated arguments + good human input = outer alignment
Outer alignment + online training = inner alignment*
* sufficient for low-stakes contexts
Our goal: solve the alignment problem.
How: develop concrete, parallelisable open problems.
Our initial focus is on asymptotic honesty guarantees (more details in the post).
1/5
Our goal: solve the alignment problem.
How: develop concrete, parallelisable open problems.
Our initial focus is on asymptotic honesty guarantees (more details in the post).
1/5
Our goal: solve the alignment problem.
How: develop concrete, parallelisable open problems.
Our initial focus is on asymptotic honesty guarantees (more details in the post).
1/5
Fill in our short, < 5-min form, and we'll get back on proposals within 1 week.
(Caveat: While we may reach out about AISI funding your project, filling out this form is not an application for funding.)
Fill in our short, < 5-min form, and we'll get back on proposals within 1 week.
(Caveat: While we may reach out about AISI funding your project, filling out this form is not an application for funding.)