Lightnews — Scholar-powered news

Nitay Alon

@nitalon.bsky.social

PhD student (@huji, @maxplanck). Trying to understand the role of Theory of Mind in AGI. Also working on Multi-agent RL, language, and (some) economics.

Posts Replies Media Videos

Nitay Alon

@nitalon.bsky.social

Our hoax is meant to call for a deeper, beyond benchmarks research of Artificial ToM. You can learn more about our mission at sites.google.com/view/theory-...

ToM4AI Workshop 2025

Registration 8:00 - 9:00

sites.google.com

April 1, 2025 at 5:50 PM

Nitay Alon

@nitalon.bsky.social

We show that using our novel algorithm dubbed ToM and GeRRi and training on pure Sally Anne tasks we can train a model to achieve ToM level of 3yo. This is amazing step in the development of Artificial ToM. If only this was true...(April fool's).

April 1, 2025 at 5:50 PM

Nitay Alon

@nitalon.bsky.social

This work bridges cognitive science and AI research, suggesting ways to make ToM evaluation more comprehensive and meaningful for real-world applications.
Read the full paper:
arxiv.org/abs/2412.13631

Mind Your Theory: Theory of Mind Goes Deeper Than Reasoning

Theory of Mind (ToM) capabilities in LLMs have recently become a central object of investigation. Cognitive science distinguishes between two steps required for ToM tasks: 1) determine whether to invo...

arxiv.org

December 19, 2024 at 12:01 PM

Nitay Alon

@nitalon.bsky.social

Our paper proposes new directions for ToM evaluation inspired by cognitive science:
* Interactive testing environments
* Adaptive mentalizing scenarios
* Both cooperative & competitive contexts
8/N

December 19, 2024 at 12:01 PM

Nitay Alon

@nitalon.bsky.social

🤖 Why this matters for AI? Without proper ToM evaluation, we risk:
* Misunderstanding LLM capabilities
* Creating inefficient systems
* Missing crucial aspects of human-AI alignment
7/N

December 19, 2024 at 12:01 PM

Nitay Alon

@nitalon.bsky.social

📊 Different scenarios need different depths of ToM:
* Cooperative tasks: often need minimal ToM
* Competitive scenarios: require deeper recursive reasoning Current benchmarks don't capture this distinction.
6/N

December 19, 2024 at 12:01 PM

Nitay Alon

@nitalon.bsky.social

🧪 Most existing work treats ToM as a static logic problem. But in reality, it's a dynamic process that evolves during interaction. We need new ways to evaluate this in LLMs. 5/N

December 19, 2024 at 12:01 PM

Nitay Alon

@nitalon.bsky.social

💡 Key insight: Current evaluations can't distinguish between different types of ToM errors:
* Not using ToM when needed
* Using wrong depth of ToM
* Using correct ToM depth but reasoning incorrectly
4/N

December 19, 2024 at 12:01 PM

Nitay Alon

@nitalon.bsky.social

🔍 Current ToM benchmarks for LLMs typically present static scenarios where it's obvious ToM should be used (like the classic Sally-Anne test). But real social interaction is dynamic - we constantly decide whether to model others' minds. 3/N

December 19, 2024 at 12:01 PM

Nitay Alon

@nitalon.bsky.social

Think of it like this: humans don't always use ToM. Oftentimes we rely on simple rules or social norms. Using ToM requires mental effort and resources. We adaptively choose when to engage it. 2/N

December 19, 2024 at 12:01 PM

Nitay Alon

@nitalon.bsky.social

ToM involves two key steps:
* Determining WHETHER to use ToM and at what depth
* Applying the correct inference once you've decided to use it
Current AI research focuses almost exclusively on the *second step*, missing the crucial first one 1/N

December 19, 2024 at 12:01 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news