Polina Kirichenko
polkirichenko.bsky.social
Polina Kirichenko
@polkirichenko.bsky.social
ML researcher
While we find that a carefully crafted system prompt can boost abstention performance, it doesn't fundamentally address the core problem: a lack of reasoning about uncertainty!
See our paper for many more other results!

7/9
June 16, 2025 at 10:03 PM
We find that very often reasoning models hallucinate missing contexts in the reasoning chain and while sometimes they express uncertainty and the caveats within the reasoning chain, they still produce a confident final answer. We hypothesize this arises from biases in data & rewards in RLVR.

6/9
June 16, 2025 at 10:03 PM
Moreover, incorporating test-time scaling as in s1 @Muennighoff et al makes things even worse!
Allocating more reasoning budget generally improves accuracy and hurts abstention.

5/9
June 16, 2025 at 10:03 PM
Remarkably, we find that reasoning post-training hurts (!) abstention performance!
We evaluated the RLVR model from Tulu @natolambert et al, s1 and DeepSeek R1 Distill models and found consistent improvements in accuracy and drops in abstention compared to instruct models.

4/9
June 16, 2025 at 10:03 PM
We curate 20 uncertainty datasets in different scenarios and evaluate 20 frontier LLMs, and find that most scenarios remain challenging even for the best models!
This allows us to conduct a systematic study of what helps and hurts abstention performance.

3/9
June 16, 2025 at 10:03 PM
LLMs are great at solving concrete problems, but how well do they handle uncertainty? There are many questions with no direct answer!
We build a diverse benchmark spanning 6 abstention scenarios (underspecification, staleness, …) and various domains (medicine, social bias, …).
June 16, 2025 at 10:03 PM
Excited to release AbstentionBench -- our paper and benchmark on evaluating LLMs’ *abstention*: the skill of knowing when NOT to answer!

Key finding: reasoning LLMs struggle with unanswerable questions and hallucinate!

Paper: arxiv.org/abs/2506.09038
Code: github.com/facebookrese...
🧵1/9
June 16, 2025 at 10:03 PM
Join us at #CVPR2025 Demographic Diversity in Computer Vision workshop tomorrow!
📅 Wednesday, June 11, 9am-6pm
📍 room 213 (main session) + Hall D (poster sessions), the Music City Center
We have an amazing lineup of speakers and panelists! Can't wait to meet you all there :)
June 10, 2025 at 1:07 PM
We are excited to announce a workshop on Demographic Diversity in Computer Vision (DemoDiv) at #CVPR 2025!

Submit your work studying various axes of demographic diversity and fairness in models and datasets and join us in Nashville in June!
Deadline: March 31st
sites.google.com/view/cvpr-20...
February 21, 2025 at 5:22 PM