Lightnews — Scholar-powered news

Elliott Thornley

@elliottthornley.bsky.social

270 followers 410 following 78 posts

Research Fellow at Oxford University's Global Priorities Institute.

Working on the philosophy of AI.

Posts Replies Media Videos

Elliott Thornley

@elliottthornley.bsky.social

September 17, 2025 at 2:28 PM

Elliott Thornley

@elliottthornley.bsky.social

And the 'Read Aloud' function doesn't work on footnotes either!

December 4, 2024 at 11:28 AM

Elliott Thornley

@elliottthornley.bsky.social

It's kind of crazy how neglected *all* the arguments for utilitarianism are. Even many philosophers think 'People only believe utilitarianism because it's simple and has a mathsy vibe.'

December 2, 2024 at 7:38 PM

Elliott Thornley

@elliottthornley.bsky.social

All good points!

December 2, 2024 at 10:01 AM

Elliott Thornley

@elliottthornley.bsky.social

It's better socially for academics to produce things that are small and good rather than big and bad.

I also think it's easier to start with something small and good and later make it bigger. It's harder to start with something big and bad and later make it better.

December 1, 2024 at 10:32 PM

Elliott Thornley

@elliottthornley.bsky.social

Yeah I think trying to solve a famous, centuries-old problem in a PhD thesis is prudentially a bad bet.

Maybe socially good to start off ambitious but even then I'm not sure. Might be better for academics to scale their ambitions later.

December 1, 2024 at 10:30 PM

Reposted by Elliott Thornley

Elliott Thornley

@elliottthornley.bsky.social

"Sure, the last 1000 grad students failed to solve the problem of induction, but that's no reason to think I can't do it."

November 26, 2024 at 9:16 PM

Elliott Thornley

@elliottthornley.bsky.social

That said, I don't think we're justified in being *very* confident for future AIs (partly for goal misgeneralization reasons). And that's bad.

December 1, 2024 at 9:45 PM

Elliott Thornley

@elliottthornley.bsky.social

I think results like these justify us in being fairly confident that most LLMs will continue to (by and large) do what their trainers intend.

December 1, 2024 at 9:45 PM

Elliott Thornley

@elliottthornley.bsky.social

They generalize well on many datasets, even though there are loads of functions they could learn that would lead to bad generalization (arxiv.org/abs/2306.17844).

December 1, 2024 at 9:44 PM

Elliott Thornley

@elliottthornley.bsky.social

For example, they're biased towards learning simpler functions (arxiv.org/abs/2006.15191) and low-frequency functions: functions whose outputs change slowly with their inputs (proceedings.mlr.press/v97/rahaman1...).

Is SGD a Bayesian sampler? Well, almost

Overparameterised deep neural networks (DNNs) are highly expressive and so can, in principle, generate almost any function that fits a training dataset with zero error. The vast majority of these func...

arxiv.org

December 1, 2024 at 9:44 PM

Elliott Thornley

@elliottthornley.bsky.social

With the problem of induction set aside, we can draw on past observations about the functions that LLMs tend to learn.

December 1, 2024 at 9:44 PM

Elliott Thornley

@elliottthornley.bsky.social

But even though we can't be certain what function an LLM implements by observing its behavior, I think we can be justified in thinking that some functions are more likely than others.

December 1, 2024 at 9:43 PM

Elliott Thornley

@elliottthornley.bsky.social

I'm more compelled by the points about the complexity of LLMs and how, for all we know, they could be implementing a wide range of functions. There's been some discussion of this issue under the labels 'goal misgeneralization' and 'inner alignment.'

December 1, 2024 at 9:42 PM

Elliott Thornley

@elliottthornley.bsky.social

But people don't think painting a wall green is provably impossible. I think most everyone agrees that we can be highly confident we've painted a wall green. The problem of induction just asks *why* we can be highly confident.

December 1, 2024 at 9:42 PM

Elliott Thornley

@elliottthornley.bsky.social

If alignment is provably impossible in virtue of the POI, then painting a wall green is also provably impossible. After all, any can of paint you buy could be grue, or gred, or grellow, etc.

December 1, 2024 at 9:41 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news