Nitay Alon
nitalon.bsky.social
Nitay Alon
@nitalon.bsky.social
PhD student (@huji, @maxplanck). Trying to understand the role of Theory of Mind in AGI. Also working on Multi-agent RL, language, and (some) economics.
Our hoax is meant to call for a deeper, beyond benchmarks research of Artificial ToM. You can learn more about our mission at sites.google.com/view/theory-...
ToM4AI Workshop 2025
Registration 8:00 - 9:00
sites.google.com
April 1, 2025 at 5:50 PM
We show that using our novel algorithm dubbed ToM and GeRRi and training on pure Sally Anne tasks we can train a model to achieve ToM level of 3yo. This is amazing step in the development of Artificial ToM. If only this was true...(April fool's).
April 1, 2025 at 5:50 PM
This work bridges cognitive science and AI research, suggesting ways to make ToM evaluation more comprehensive and meaningful for real-world applications.
Read the full paper:
arxiv.org/abs/2412.13631
Mind Your Theory: Theory of Mind Goes Deeper Than Reasoning
Theory of Mind (ToM) capabilities in LLMs have recently become a central object of investigation. Cognitive science distinguishes between two steps required for ToM tasks: 1) determine whether to invo...
arxiv.org
December 19, 2024 at 12:01 PM
Our paper proposes new directions for ToM evaluation inspired by cognitive science:
* Interactive testing environments
* Adaptive mentalizing scenarios
* Both cooperative & competitive contexts
8/N
December 19, 2024 at 12:01 PM
🤖 Why this matters for AI? Without proper ToM evaluation, we risk:
* Misunderstanding LLM capabilities
* Creating inefficient systems
* Missing crucial aspects of human-AI alignment
7/N
December 19, 2024 at 12:01 PM
📊 Different scenarios need different depths of ToM:
* Cooperative tasks: often need minimal ToM
* Competitive scenarios: require deeper recursive reasoning Current benchmarks don't capture this distinction.
6/N
December 19, 2024 at 12:01 PM
🧪 Most existing work treats ToM as a static logic problem. But in reality, it's a dynamic process that evolves during interaction. We need new ways to evaluate this in LLMs. 5/N
December 19, 2024 at 12:01 PM
💡 Key insight: Current evaluations can't distinguish between different types of ToM errors:
* Not using ToM when needed
* Using wrong depth of ToM
* Using correct ToM depth but reasoning incorrectly
4/N
December 19, 2024 at 12:01 PM
🔍 Current ToM benchmarks for LLMs typically present static scenarios where it's obvious ToM should be used (like the classic Sally-Anne test). But real social interaction is dynamic - we constantly decide whether to model others' minds. 3/N
December 19, 2024 at 12:01 PM
Think of it like this: humans don't always use ToM. Oftentimes we rely on simple rules or social norms. Using ToM requires mental effort and resources. We adaptively choose when to engage it. 2/N
December 19, 2024 at 12:01 PM
ToM involves two key steps:
* Determining WHETHER to use ToM and at what depth
* Applying the correct inference once you've decided to use it
Current AI research focuses almost exclusively on the *second step*, missing the crucial first one 1/N
December 19, 2024 at 12:01 PM