Read the full paper:
arxiv.org/abs/2412.13631
Read the full paper:
arxiv.org/abs/2412.13631
* Interactive testing environments
* Adaptive mentalizing scenarios
* Both cooperative & competitive contexts
8/N
* Interactive testing environments
* Adaptive mentalizing scenarios
* Both cooperative & competitive contexts
8/N
* Misunderstanding LLM capabilities
* Creating inefficient systems
* Missing crucial aspects of human-AI alignment
7/N
* Misunderstanding LLM capabilities
* Creating inefficient systems
* Missing crucial aspects of human-AI alignment
7/N
* Cooperative tasks: often need minimal ToM
* Competitive scenarios: require deeper recursive reasoning Current benchmarks don't capture this distinction.
6/N
* Cooperative tasks: often need minimal ToM
* Competitive scenarios: require deeper recursive reasoning Current benchmarks don't capture this distinction.
6/N
* Not using ToM when needed
* Using wrong depth of ToM
* Using correct ToM depth but reasoning incorrectly
4/N
* Not using ToM when needed
* Using wrong depth of ToM
* Using correct ToM depth but reasoning incorrectly
4/N
* Determining WHETHER to use ToM and at what depth
* Applying the correct inference once you've decided to use it
Current AI research focuses almost exclusively on the *second step*, missing the crucial first one 1/N
* Determining WHETHER to use ToM and at what depth
* Applying the correct inference once you've decided to use it
Current AI research focuses almost exclusively on the *second step*, missing the crucial first one 1/N