Stanford
jaredmoore.org
Preprint: arxiv.org/abs/2507.16196
Code: github.com/jlcmoore/mindgames
Demo: mindgames.camrobjones.com
/end 🧵
Preprint: arxiv.org/abs/2507.16196
Code: github.com/jlcmoore/mindgames
Demo: mindgames.camrobjones.com
/end 🧵
I'll also be presenting it at the PragLM workshop at COLM in Montreal this October.
I'll also be presenting it at the PragLM workshop at COLM in Montreal this October.
* Spectatorial ToM: Observing and predicting mental states.
* Planning ToM: Actively intervening to change mental states through interaction.
Current LLMs excel at the first but fail at the second.
* Spectatorial ToM: Observing and predicting mental states.
* Planning ToM: Actively intervening to change mental states through interaction.
Current LLMs excel at the first but fail at the second.
In REVEALED condition (mental states given to persuader): Humans: 22% success ❌ o1-preview: 78% success ✅
In HIDDEN condition (persuader must infer mental states): Humans: 29% success ✅ o1-preview: 18% success ❌
Complete reversal!
In REVEALED condition (mental states given to persuader): Humans: 22% success ❌ o1-preview: 78% success ✅
In HIDDEN condition (persuader must infer mental states): Humans: 29% success ✅ o1-preview: 18% success ❌
Complete reversal!
*a bot
*a bot
Declan Grabb
@wagnew.dair-community.social
@klyman.bsky.social
@schancellor.bsky.social
Nick Haber
@desmond-ong.bsky.social
Thanks ❤️
Declan Grabb
@wagnew.dair-community.social
@klyman.bsky.social
@schancellor.bsky.social
Nick Haber
@desmond-ong.bsky.social
Thanks ❤️
arxiv.org/abs/2504.18412
arxiv.org/abs/2504.18412
Please reach out if you'd like to meet!
And read @StanfordHAI's post about our work here:
https://t.co/h3CaBVnX7g
Please reach out if you'd like to meet!
And read @StanfordHAI's post about our work here:
https://t.co/h3CaBVnX7g
We found that people supported the contractualist Nash Product over the Utilitarian Sum.
Preprint here:
https://arxiv.org/abs/2410.05496
We found that people supported the contractualist Nash Product over the Utilitarian Sum.
Preprint here:
https://arxiv.org/abs/2410.05496