www.alignmentforum.org/posts/iELyAq...
www.alignmentforum.org/posts/iELyAq...
We need to get each part of the above right – exploration guarantees and human input particularly stand out to me (optimistic about obfuscated arguments, stand by for future publications...)
We need to get each part of the above right – exploration guarantees and human input particularly stand out to me (optimistic about obfuscated arguments, stand by for future publications...)
– Debate gets you correctness/honesty. That's not sufficient for harmlessness, but is a great first step.
– Low-stakes alignment (where you get single errors, but not errors on average) seems (imo) totally do-able
– Debate gets you correctness/honesty. That's not sufficient for harmlessness, but is a great first step.
– Low-stakes alignment (where you get single errors, but not errors on average) seems (imo) totally do-able
– A safety case sketch for debate, giving a whole host more detail on the open problems.
– A series of posts (something like 1 a week) diving into various problems we'd like to see solved.
5/5
– A safety case sketch for debate, giving a whole host more detail on the open problems.
– A series of posts (something like 1 a week) diving into various problems we'd like to see solved.
5/5
bsky.app/profile/benj...
4/5
Fill in our short, < 5-min form, and we'll get back on proposals within 1 week.
(Caveat: While we may reach out about AISI funding your project, filling out this form is not an application for funding.)
bsky.app/profile/benj...
4/5
Why we're excited about safety cases
Why we focus (initially) on honesty
What we mean when we talk about 'asymptotic guarantees'
3/5
Why we're excited about safety cases
Why we focus (initially) on honesty
What we mean when we talk about 'asymptotic guarantees'
3/5
Link to AISI's research agenda: aisi.gov.uk/research-age...
2/5
Link to AISI's research agenda: aisi.gov.uk/research-age...
2/5
www.aisi.gov.uk/grants
www.aisi.gov.uk/grants
forms.office.com/e/BFbeUeWYQ9
forms.office.com/e/BFbeUeWYQ9
– ML researchers
– Complexity theorists
– Game theorists
– Cognitive scientists
– People who could build datasets
– People who could run human studies
– Anyone else who thinks they might be doing, or could be doing, relevant work
– ML researchers
– Complexity theorists
– Game theorists
– Cognitive scientists
– People who could build datasets
– People who could run human studies
– Anyone else who thinks they might be doing, or could be doing, relevant work
We do this by:
1. Identify key alignment subproblems
2. Identifying people who can solve them
3. Funding research
We do this by:
1. Identify key alignment subproblems
2. Identifying people who can solve them
3. Funding research