Tom Everitt
@tom4everitt.bsky.social
AGI safety researcher at Google DeepMind, leading causalincentives.com
Personal website: tomeveritt.se
Personal website: tomeveritt.se
"We think that Mars could be green in our lifetime
This is not an Earth clone, but rather a thin, life-supporting envelope that still exhibits large day-to-night temperature swings but blocks most radiation. Such a state would allow people to live outside on the planet’s surface"
Very cool!
This is not an Earth clone, but rather a thin, life-supporting envelope that still exhibits large day-to-night temperature swings but blocks most radiation. Such a state would allow people to live outside on the planet’s surface"
Very cool!
October 29, 2025 at 6:37 PM
"We think that Mars could be green in our lifetime
This is not an Earth clone, but rather a thin, life-supporting envelope that still exhibits large day-to-night temperature swings but blocks most radiation. Such a state would allow people to live outside on the planet’s surface"
Very cool!
This is not an Earth clone, but rather a thin, life-supporting envelope that still exhibits large day-to-night temperature swings but blocks most radiation. Such a state would allow people to live outside on the planet’s surface"
Very cool!
I was initially confused how they managed to do a randomized control trial on this. Seems they in each workflow randomly turned on the tool for a subset of the customers
October 15, 2025 at 8:11 PM
I was initially confused how they managed to do a randomized control trial on this. Seems they in each workflow randomly turned on the tool for a subset of the customers
the focus on practical capacities is very sensible! though on basis on that, I thought you would focus on what LLMs do to humans' practical capacity to feel empathy with other beings, rather than whether LLMs satisfy humans' need to be emphasized with
October 9, 2025 at 8:15 PM
the focus on practical capacities is very sensible! though on basis on that, I thought you would focus on what LLMs do to humans' practical capacity to feel empathy with other beings, rather than whether LLMs satisfy humans' need to be emphasized with
Interesting. Could the measure also be applied to the human, assessing changes to their empowerment over time?
October 2, 2025 at 7:57 PM
Interesting. Could the measure also be applied to the human, assessing changes to their empowerment over time?
Interesting, does the method rely on being able to set different goals for the LLM?
October 2, 2025 at 5:11 PM
Interesting, does the method rely on being able to set different goals for the LLM?
Interesting. I recall Rich Sutton made a similar suggestion in the 3rd edition of his RL book, arguing we should optimize average reward rather than discount
September 25, 2025 at 8:22 PM
Interesting. I recall Rich Sutton made a similar suggestion in the 3rd edition of his RL book, arguing we should optimize average reward rather than discount
Reposted by Tom Everitt
digital-strategy.ec.europa.eu/en/policies/... The Code also has two other, separate Chapters (Copyright, Transparency). The Chapter I co-chaired (Safety & Security) is a compliance tool for the small number of frontier AI companies to whom the “Systemic Risk” obligations of the AI Act apply.
2/3
2/3
The General-Purpose AI Code of Practice
The Code of Practice helps industry comply with the AI Act legal obligations on safety, transparency and copyright of general-purpose AI models.
digital-strategy.ec.europa.eu
July 10, 2025 at 11:53 AM
digital-strategy.ec.europa.eu/en/policies/... The Code also has two other, separate Chapters (Copyright, Transparency). The Chapter I co-chaired (Safety & Security) is a compliance tool for the small number of frontier AI companies to whom the “Systemic Risk” obligations of the AI Act apply.
2/3
2/3
This is an interesting explanation. But surely boys falling behind is nevertheless an important and underrated problem?
June 27, 2025 at 9:07 PM
This is an interesting explanation. But surely boys falling behind is nevertheless an important and underrated problem?
Interesting. But is case 2 *real* introspection? It infers its internal temperature based on its external output, which feels more like inference based on exospection rather than proper introspection. (I know human "intro"spection often works like this too, but still)
June 10, 2025 at 7:50 PM
Interesting. But is case 2 *real* introspection? It infers its internal temperature based on its external output, which feels more like inference based on exospection rather than proper introspection. (I know human "intro"spection often works like this too, but still)
… and many more! Check out our paper arxiv.org/pdf/2506.01622, or come chat to @jonrichens.bsky.social, @dabelcs.bsky.social or Alexis Bellot at #ICML2025
arxiv.org
June 4, 2025 at 3:54 PM
… and many more! Check out our paper arxiv.org/pdf/2506.01622, or come chat to @jonrichens.bsky.social, @dabelcs.bsky.social or Alexis Bellot at #ICML2025
Causality. In previous work we showed a causal world model is needed for robustness. It turns out you don’t need as much causal knowledge of the environment for task generalization. There is a causal hierarchy, but for agency and agent capabilities, rather than inference!
June 4, 2025 at 3:51 PM
Causality. In previous work we showed a causal world model is needed for robustness. It turns out you don’t need as much causal knowledge of the environment for task generalization. There is a causal hierarchy, but for agency and agent capabilities, rather than inference!
Emergent capabilities. To minimize training loss across many goals, agents must learn a world model, which can solve tasks the agent was not explicitly trained on. Simple goal-directedness gives rise to many capabilities (social cognition, reasoning about uncertainty, intent…).
June 4, 2025 at 3:51 PM
Emergent capabilities. To minimize training loss across many goals, agents must learn a world model, which can solve tasks the agent was not explicitly trained on. Simple goal-directedness gives rise to many capabilities (social cognition, reasoning about uncertainty, intent…).
Safety. Several approaches to AI safety require accurate world models, but agent capabilities could outpace our ability to build them. Our work gives a theoretical guarantee: we can extract world models from agents, and the model fidelity increases with the agent's capabilities.
June 4, 2025 at 3:51 PM
Safety. Several approaches to AI safety require accurate world models, but agent capabilities could outpace our ability to build them. Our work gives a theoretical guarantee: we can extract world models from agents, and the model fidelity increases with the agent's capabilities.
Extracting world knowledge from agents. We derive algorithms that recover a world model given the agent’s policy and goal (policy + goal -> world model). These algorithms complete the triptych of planning (world model + goal -> policy) and IRL (world model + policy -> goal).
June 4, 2025 at 3:51 PM
Extracting world knowledge from agents. We derive algorithms that recover a world model given the agent’s policy and goal (policy + goal -> world model). These algorithms complete the triptych of planning (world model + goal -> policy) and IRL (world model + policy -> goal).
Fundamental limitations on agency. In environments where the dynamics are provably hard to learn, or where long-horizon prediction is infeasible, the capabilities of agents are fundamentally bounded.
June 4, 2025 at 3:50 PM
Fundamental limitations on agency. In environments where the dynamics are provably hard to learn, or where long-horizon prediction is infeasible, the capabilities of agents are fundamentally bounded.
No model-free path. If you want to train an agent capable of a wide range of goal-directed tasks, you can’t avoid the challenge of learning a world model. And to improve performance or generality, agents need to learn increasingly accurate and detailed world models.
June 4, 2025 at 3:50 PM
No model-free path. If you want to train an agent capable of a wide range of goal-directed tasks, you can’t avoid the challenge of learning a world model. And to improve performance or generality, agents need to learn increasingly accurate and detailed world models.
These results have several interesting consequences, from emergent capabilities to AI safety… 👇
June 4, 2025 at 3:49 PM
These results have several interesting consequences, from emergent capabilities to AI safety… 👇