Lightnews — Scholar-powered news

Tom Everitt

@tom4everitt.bsky.social

1.1K followers 360 following 99 posts

AGI safety researcher at Google DeepMind, leading causalincentives.com

Personal website: tomeveritt.se

Posts Replies Media Videos

Tom Everitt

@tom4everitt.bsky.social

Causality. In previous work we showed a causal world model is needed for robustness. It turns out you don’t need as much causal knowledge of the environment for task generalization. There is a causal hierarchy, but for agency and agent capabilities, rather than inference!

June 4, 2025 at 3:51 PM

Tom Everitt

@tom4everitt.bsky.social

Emergent capabilities. To minimize training loss across many goals, agents must learn a world model, which can solve tasks the agent was not explicitly trained on. Simple goal-directedness gives rise to many capabilities (social cognition, reasoning about uncertainty, intent…).

June 4, 2025 at 3:51 PM

Tom Everitt

@tom4everitt.bsky.social

Extracting world knowledge from agents. We derive algorithms that recover a world model given the agent’s policy and goal (policy + goal -> world model). These algorithms complete the triptych of planning (world model + goal -> policy) and IRL (world model + policy -> goal).

June 4, 2025 at 3:51 PM

Tom Everitt

@tom4everitt.bsky.social

Specifically, we show it’s possible to recover a bounded error approximation of the environment transition function from any goal-conditional policy that satisfies a regret bound across a wide enough set of simple goals, like steering the environment into a desired state.

June 4, 2025 at 3:49 PM

Tom Everitt

@tom4everitt.bsky.social

Turns out there’s a neat answer to this question. We prove that any agent capable of generalizing to a broad range of simple goal-directed tasks must have learned a predictive model capable of simulating its environment. And this model can always be recovered from the agent.

June 4, 2025 at 3:48 PM

Tom Everitt

@tom4everitt.bsky.social

Are world models necessary to achieve human-level agents, or is there a model-free short-cut?
Our new #ICML2025 paper tackles this question from first principles, and finds a surprising answer, agents _are_ world models… 🧵
arxiv.org/abs/2506.01622

June 4, 2025 at 3:48 PM

Tom Everitt

@tom4everitt.bsky.social

Our definition also predicts other indicators of goal-directedness, such as propensity to rebuild a fallen tower, and persistence at Progress Ratio Tasks. (In PRTs, you can quit at any time, and face diminishing returns. They are commonly used to assess human goal-directedness.)

April 17, 2025 at 9:48 AM

Tom Everitt

@tom4everitt.bsky.social

Notably, goal-directedness is fairly consistent across tasks, indicating that it may be an intrinsic property of models. Newer models are often more goal-directed, perhaps due to better RL post-training

April 17, 2025 at 9:47 AM

Tom Everitt

@tom4everitt.bsky.social

What if LLMs are sometimes capable of doing a task but don't try hard enough to do it?

In a new paper, we use subtasks to assess capabilities. Perhaps surprisingly, LLMs often fail to fully employ their capabilities, i.e. they are not fully *goal-directed* 🧵

arxiv.org/abs/2504.118...

April 17, 2025 at 9:44 AM

Tom Everitt

@tom4everitt.bsky.social

Process based supervision done right, and with pretty CIDs to illustrate :)

January 23, 2025 at 8:33 PM

Tom Everitt

@tom4everitt.bsky.social

Formally, we measure the predictive power of the hypothesis that a policy is optimising a utility function, using the ‘soft optimal’ behaviour model in causal model or MDP.

December 14, 2024 at 9:08 PM

Tom Everitt

@tom4everitt.bsky.social

How can we measure goal-directedness? Our idea: by considering how well the goal predicts the behaviour. For example, the behaviour of a mouse that always seeks out a piece of cheese is well-predicted by the cheese goal.

December 14, 2024 at 9:07 PM

Tom Everitt

@tom4everitt.bsky.social

🔦 @NeurIPS2024 spotlight paper we’re presenting today. Making AI systems more agentic is a hot research topic. But powerful agents bring worries about misalignment and loss of control. Can we measure how agentic an AI system is? 🧵

December 14, 2024 at 9:05 PM

Tom Everitt

@tom4everitt.bsky.social

On the philosophical side, but both insightful and accessible

November 27, 2024 at 7:14 PM

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news