Elliott Thornley
elliottthornley.bsky.social
Elliott Thornley
@elliottthornley.bsky.social
Research Fellow at Oxford University's Global Priorities Institute.

Working on the philosophy of AI.
September 17, 2025 at 2:28 PM
And the 'Read Aloud' function doesn't work on footnotes either!
December 4, 2024 at 11:28 AM
It's kind of crazy how neglected *all* the arguments for utilitarianism are. Even many philosophers think 'People only believe utilitarianism because it's simple and has a mathsy vibe.'
December 2, 2024 at 7:38 PM
All good points!
December 2, 2024 at 10:01 AM
It's better socially for academics to produce things that are small and good rather than big and bad.

I also think it's easier to start with something small and good and later make it bigger. It's harder to start with something big and bad and later make it better.
December 1, 2024 at 10:32 PM
Yeah I think trying to solve a famous, centuries-old problem in a PhD thesis is prudentially a bad bet.

Maybe socially good to start off ambitious but even then I'm not sure. Might be better for academics to scale their ambitions later.
December 1, 2024 at 10:30 PM
Reposted by Elliott Thornley
"Sure, the last 1000 grad students failed to solve the problem of induction, but that's no reason to think I can't do it."
November 26, 2024 at 9:16 PM
That said, I don't think we're justified in being *very* confident for future AIs (partly for goal misgeneralization reasons). And that's bad.
December 1, 2024 at 9:45 PM
I think results like these justify us in being fairly confident that most LLMs will continue to (by and large) do what their trainers intend.
December 1, 2024 at 9:45 PM
They generalize well on many datasets, even though there are loads of functions they could learn that would lead to bad generalization (arxiv.org/abs/2306.17844).
December 1, 2024 at 9:44 PM
For example, they're biased towards learning simpler functions (arxiv.org/abs/2006.15191) and low-frequency functions: functions whose outputs change slowly with their inputs (proceedings.mlr.press/v97/rahaman1...).
Is SGD a Bayesian sampler? Well, almost
Overparameterised deep neural networks (DNNs) are highly expressive and so can, in principle, generate almost any function that fits a training dataset with zero error. The vast majority of these func...
arxiv.org
December 1, 2024 at 9:44 PM
With the problem of induction set aside, we can draw on past observations about the functions that LLMs tend to learn.
December 1, 2024 at 9:44 PM
But even though we can't be certain what function an LLM implements by observing its behavior, I think we can be justified in thinking that some functions are more likely than others.
December 1, 2024 at 9:43 PM
I'm more compelled by the points about the complexity of LLMs and how, for all we know, they could be implementing a wide range of functions. There's been some discussion of this issue under the labels 'goal misgeneralization' and 'inner alignment.'
December 1, 2024 at 9:42 PM
But people don't think painting a wall green is provably impossible. I think most everyone agrees that we can be highly confident we've painted a wall green. The problem of induction just asks *why* we can be highly confident.
December 1, 2024 at 9:42 PM
If alignment is provably impossible in virtue of the POI, then painting a wall green is also provably impossible. After all, any can of paint you buy could be grue, or gred, or grellow, etc.
December 1, 2024 at 9:41 PM