Anthony GX-Chen
agx-chen.bsky.social
Anthony GX-Chen
@agx-chen.bsky.social
PhD student at NYU CILVR. Prev: Master's at McGill / Mila. || RL, ML, Neuroscience.

https://im-ant.github.io/
Thank you for the shoutout Alison! We actually just arXiv-ed the paper. Attaching the thread below :)

bsky.app/profile/agx-...
Language model (LM) agents are all the rage now—but they may exhibit cognitive biases when inferring causal relationships!

We evaluate LMs on a cognitive task to find:
- LMs struggle with certain simple causal relationships
- They show biases similar to human adults (but not children)

🧵⬇️
May 16, 2025 at 4:48 PM
More details in our paper!
arxiv.org/abs/2505.09614

This was a joint effort with an amazing interdisciplinary team: @dongyanl1n.bsky.social , @mandanas.bsky.social , Doina Precup, @tyrellturing.bsky.social , Rob Fergus, @kennethmarino.bsky.social

We'd love to get your feedback :)
Language Agents Mirror Human Causal Reasoning Biases. How Can We Help Them Think Like Scientists?
Language model (LM) agents are increasingly used as autonomous decision-makers who need to actively gather information to guide their decisions. A crucial cognitive skill for such agents is the effici...
arxiv.org
May 16, 2025 at 4:45 PM
How can we help LMs think more rigorously, like scientists?

We fix this “biased prior” by explicitly sampling a higher-entropy hypothesis distribution, then prompting the LM to maximize info gain under the new distribution. This significantly improves exploration and inference performance!
May 16, 2025 at 4:45 PM
Why do LMs have this “cognitive bias”? We compare the LMs’ behaviour to human data, and find that most LMs behave like adults, and less like children who are more receptive to alternative hypotheses. This may suggest that LM trained on adult generated data inherits the same human irrationalities.
May 16, 2025 at 4:45 PM
We evaluate LMs on the classic “Blicket Test” pioneered by @alisongopnik.bsky.social . The goal: assess their abilities to discover and infer causal relationships.

Across a range of models, LMs consistently struggle more with the “conjunctive” (AND) rule, but not the “disjunctive” (OR) rule.
May 16, 2025 at 4:45 PM