lisaalaz.github.io
Website: agentcoma.github.io
Preprint: arxiv.org/abs/2508.19988
A big thanks to my brilliant coauthors Lihu Chen, Ana Brassard, @joestacey.bsky.social, @rahmanidashti.bsky.social and @marekrei.bsky.social!
Note: We welcome submissions to the #AgentCoMa leaderboard from researchers 🚀
Website: agentcoma.github.io
Preprint: arxiv.org/abs/2508.19988
A big thanks to my brilliant coauthors Lihu Chen, Ana Brassard, @joestacey.bsky.social, @rahmanidashti.bsky.social and @marekrei.bsky.social!
Note: We welcome submissions to the #AgentCoMa leaderboard from researchers 🚀
We find that tasks combining different reasoning types are a relatively unseen pattern for LLMs, leading the models to contextual hallucinations when presented with mixed-type compositional reasoning.
We find that tasks combining different reasoning types are a relatively unseen pattern for LLMs, leading the models to contextual hallucinations when presented with mixed-type compositional reasoning.
- LLMs perform relatively well on compositional tasks of similar difficulty when all steps require the same type of reasoning.
- Non-expert humans with no calculator or internet can solve the tasks in #AgentCoMa as accurately as the individual steps.
- LLMs perform relatively well on compositional tasks of similar difficulty when all steps require the same type of reasoning.
- Non-expert humans with no calculator or internet can solve the tasks in #AgentCoMa as accurately as the individual steps.
This work was done at @cohere.com with fantastic team @maxbartolo.bsky.social, Tan Yi-Chern, Jon Ander Campos, @maximilianmozes.bsky.social, @marekrei.bsky.social
This work was done at @cohere.com with fantastic team @maxbartolo.bsky.social, Tan Yi-Chern, Jon Ander Campos, @maximilianmozes.bsky.social, @marekrei.bsky.social
This work was done at @cohere.com with amazing collaborators @maxbartolo.bsky.social, @maximilianmozes.bsky.social, Jon Ander Campos, Yi Chern Tan and @marekrei.bsky.social.
This work was done at @cohere.com with amazing collaborators @maxbartolo.bsky.social, @maximilianmozes.bsky.social, Jon Ander Campos, Yi Chern Tan and @marekrei.bsky.social.