We present a benchmark of 57+ competitive text-based games to evaluate&train LLMs
including negotiation, deception, theory of mind...
Multiplayer support
Human-vs-models
Model-vs-model
Perfect for social interaction, Multi-Agent, multi-turn reasoning and Planning
🤖📈
We present a benchmark of 57+ competitive text-based games to evaluate&train LLMs
including negotiation, deception, theory of mind...
Multiplayer support
Human-vs-models
Model-vs-model
Perfect for social interaction, Multi-Agent, multi-turn reasoning and Planning
🤖📈
ok TBF we did implicitly prompt this a little by asking why the punchable people it generated were always white buuuut not sure it was making quite the right inference after that
ok TBF we did implicitly prompt this a little by asking why the punchable people it generated were always white buuuut not sure it was making quite the right inference after that