Lightnews — Scholar-powered news

Anthony GX-Chen

@agx-chen.bsky.social

24 followers 21 following 7 posts

PhD student at NYU CILVR. Prev: Master's at McGill / Mila. || RL, ML, Neuroscience.

https://im-ant.github.io/

Posts Replies Media Videos

Anthony GX-Chen

@agx-chen.bsky.social

Thank you for the shoutout Alison! We actually just arXiv-ed the paper. Attaching the thread below :)

bsky.app/profile/agx-...

Anthony GX-Chen @agx-chen.bsky.social · May 16

Language model (LM) agents are all the rage now—but they may exhibit cognitive biases when inferring causal relationships!

We evaluate LMs on a cognitive task to find:
- LMs struggle with certain simple causal relationships
- They show biases similar to human adults (but not children)

🧵⬇️

Example of the Blicket Test experiment. A subset of objects activate the machine following an unobserved rule ("disjunctive" / "conjunctive"). The agent needs to interact with the environment by placing objects on/off the machine to figure out the rule.

May 16, 2025 at 4:48 PM

Anthony GX-Chen

@agx-chen.bsky.social

More details in our paper!
arxiv.org/abs/2505.09614

This was a joint effort with an amazing interdisciplinary team: @dongyanl1n.bsky.social , @mandanas.bsky.social , Doina Precup, @tyrellturing.bsky.social , Rob Fergus, @kennethmarino.bsky.social

We'd love to get your feedback :)

Language Agents Mirror Human Causal Reasoning Biases. How Can We Help Them Think Like Scientists?

Language model (LM) agents are increasingly used as autonomous decision-makers who need to actively gather information to guide their decisions. A crucial cognitive skill for such agents is the effici...

arxiv.org

May 16, 2025 at 4:45 PM

Anthony GX-Chen

@agx-chen.bsky.social

How can we help LMs think more rigorously, like scientists?

We fix this “biased prior” by explicitly sampling a higher-entropy hypothesis distribution, then prompting the LM to maximize info gain under the new distribution. This significantly improves exploration and inference performance!

Agent samples (without replacement) from the LM prior at inference time to construct a new prior with higher entropy. It then iteratively prompts the LM to take actions that maximize information gain under the new distribution, and eliminate hypotheses inconsistent with new observations.

Hypothesis sampling allows the agent to correct its “disjunctive bias”, and perform equally well on both disjunctive and conjunctive environments when sufficiently many hypotheses are sampled.

May 16, 2025 at 4:45 PM

Anthony GX-Chen

@agx-chen.bsky.social

Why do LMs have this “cognitive bias”? We compare the LMs’ behaviour to human data, and find that most LMs behave like adults, and less like children who are more receptive to alternative hypotheses. This may suggest that LM trained on adult generated data inherits the same human irrationalities.

(Left) When presented with conjunctive evidence, most LMs tend to prefer disjunctive inferences, similar to human adults. (Right) LM exploration behaviour appears more affected by the underlying causal rule, while children's behaviour do not appear to show significant differences.

May 16, 2025 at 4:45 PM

Anthony GX-Chen

@agx-chen.bsky.social

We evaluate LMs on the classic “Blicket Test” pioneered by @alisongopnik.bsky.social . The goal: assess their abilities to discover and infer causal relationships.

Across a range of models, LMs consistently struggle more with the “conjunctive” (AND) rule, but not the “disjunctive” (OR) rule.

Causal exploration efficiency of different LMs, measured by number of hypotheses remaining after each step in the environment. Lower y axis means agent generated observations that eliminate more hypotheses. The goal is to eliminate all but one hypothesis (which is the true causal relationship).

May 16, 2025 at 4:45 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news