https://im-ant.github.io/
bsky.app/profile/agx-...
We evaluate LMs on a cognitive task to find:
- LMs struggle with certain simple causal relationships
- They show biases similar to human adults (but not children)
🧵⬇️
bsky.app/profile/agx-...
arxiv.org/abs/2505.09614
This was a joint effort with an amazing interdisciplinary team: @dongyanl1n.bsky.social , @mandanas.bsky.social , Doina Precup, @tyrellturing.bsky.social , Rob Fergus, @kennethmarino.bsky.social
We'd love to get your feedback :)
arxiv.org/abs/2505.09614
This was a joint effort with an amazing interdisciplinary team: @dongyanl1n.bsky.social , @mandanas.bsky.social , Doina Precup, @tyrellturing.bsky.social , Rob Fergus, @kennethmarino.bsky.social
We'd love to get your feedback :)
We fix this “biased prior” by explicitly sampling a higher-entropy hypothesis distribution, then prompting the LM to maximize info gain under the new distribution. This significantly improves exploration and inference performance!
We fix this “biased prior” by explicitly sampling a higher-entropy hypothesis distribution, then prompting the LM to maximize info gain under the new distribution. This significantly improves exploration and inference performance!
Across a range of models, LMs consistently struggle more with the “conjunctive” (AND) rule, but not the “disjunctive” (OR) rule.
Across a range of models, LMs consistently struggle more with the “conjunctive” (AND) rule, but not the “disjunctive” (OR) rule.