Jérémie Beucler
banner
jeremiebeucler.bsky.social
Jérémie Beucler
@jeremiebeucler.bsky.social
PhD student with Wim de Neys & Lucie Charles at LaPsyDE; MSc in Cog Sciences at ENS - interested in reasoning & metacognition

https://jeremie-beucler.github.io/
Brilliant! Congrats Tanay 🙌
November 4, 2025 at 12:15 PM
10/10

Huge thanks to my great co-authors @zoepurcell.bsky.social , @luciecharlesneuro.bsky.social and @wimdeneys.bsky.social, and to my lab @lapsyde.bsky.social.

Stay tuned for the computational modeling part! 🤓

You can access the preprint here: osf.io/preprints/ps...
OSF
osf.io
October 16, 2025 at 4:17 PM
9/10

To make this more practical, we release the 'baserater' R package. It allows you to access the database easily and to generate new items automatically using the LLM and prompt of your choice.

GitHub: jeremie-beucler.github.io/baserater (soon on CRAN!)
GitHub - Jeremie-Beucler/baserater
Contribute to Jeremie-Beucler/baserater development by creating an account on GitHub.
github.com
October 16, 2025 at 4:17 PM
8/10

We also re-analyzed existing base-rate stimuli from past research using our method. The results revealed a large, previously unnoticed variability in belief strength, which could be problematic in some cases.
October 16, 2025 at 4:17 PM
7/10

This method allows us to create a massive database of over 100,000 base-rate items, each with an associated belief strength value.

Here is an example of every possible items for one single adjective out of 66 ("Arrogant")! Best to be a kindergarten teacher than a politician in this case. 🤭
October 16, 2025 at 4:17 PM
6/10

And it works really well! LLM-generated ratings showed a very strong correlation with human judgments.

More importantly, our belief-strength measure robustly predicted participants' actual choices in a separate base-rate neglect experiment!
October 16, 2025 at 4:17 PM
5/10

We tested this idea on the classic lawyer–engineer base-rate neglect task, asking GPT-4 and LLaMA 3.3 to rate how strongly traits (like “kind”) are associated with groups (like “nurse”) using typicality ratings, a proxy for p(trait|group).
October 16, 2025 at 4:17 PM
4/10

Could LLMs help? 🤖

For once, having human-like biases is desirable! Because LLMs are trained on vast amounts of human text, they implicitly encode typical associations, and may be great at measuring belief strength!
October 16, 2025 at 4:17 PM
3/10

We argue that measuring “belief strength” is a major bottleneck in reasoning research, which mostly relies on conflict vs. no-conflict items.

It requires costly human ratings and is rarely done parametrically, limiting the development of theoretical & computational models of biased reasoning.
October 16, 2025 at 4:17 PM
2/10

Cognitive biases often involve a mental conflict between intuitive beliefs (“nurses are kind”) and logical or probabilistic information (995 vs 5). 🤯

But how strong is the pull of that belief?
October 16, 2025 at 4:17 PM
August 11, 2025 at 1:06 PM