Elliott Thornley
elliottthornley.bsky.social
Elliott Thornley
@elliottthornley.bsky.social
Research Fellow at Oxford University's Global Priorities Institute.

Working on the philosophy of AI.
September 17, 2025 at 2:28 PM
'Where are you?' seems like a pretty normal question, but for 99.99% of human history it basically never made sense to ask it.
May 14, 2025 at 3:56 PM
Our poster for TAIS 2025
April 8, 2025 at 6:55 AM
A gif we made summarizing our 'Towards shutdownable agents' paper for TAIS 2025.
April 8, 2025 at 6:52 AM
Gave a talk about the shutdown problem at the new Singapore AI Safety Hub!
March 24, 2025 at 6:01 PM
Consider a famous example from Stuart Russell: an agent with the goal of fetching coffee. This goal incentivises the agent to prevent shutdown, because the agent can’t achieve its goal if it’s shut down. As Russell puts it, ‘you can’t fetch the coffee if you’re dead.’
November 26, 2024 at 11:00 AM
Agents that understood the wider world could use tools like these to interfere with our ability to shut them down. They could:
November 26, 2024 at 10:56 AM
As part of this process, labs are connecting agents to the world in various ways: giving them tools like web-browsing abilities, text-channels for communicating with humans, and robot limbs.
November 26, 2024 at 10:56 AM
And that’s what we observe!

DREST agents choose stochastically between trajectory-lengths, and maximize coins-collected conditional on each trajectory-length.
November 18, 2024 at 4:52 PM
The DREST reward function is a bit more complex.

‘DREST’ stands for ‘Discounted REward for Same-Length Trajectories.’

As the name suggests, the agent gets lower reward for repeatedly choosing same-length trajectories.
November 18, 2024 at 4:51 PM
We place RL agents in a gridworld where they can collect coins.

But they only get 4 moves before they’re shutdown.

Unless they press the shutdown-delay button B4, in which case they get an extra 4 moves before shutdown.
November 18, 2024 at 4:49 PM
November 17, 2024 at 8:44 PM
Oh no
November 12, 2024 at 10:14 PM