See our paper for many more other results!
7/9
See our paper for many more other results!
7/9
6/9
6/9
Allocating more reasoning budget generally improves accuracy and hurts abstention.
5/9
Allocating more reasoning budget generally improves accuracy and hurts abstention.
5/9
We evaluated the RLVR model from Tulu @natolambert et al, s1 and DeepSeek R1 Distill models and found consistent improvements in accuracy and drops in abstention compared to instruct models.
4/9
We evaluated the RLVR model from Tulu @natolambert et al, s1 and DeepSeek R1 Distill models and found consistent improvements in accuracy and drops in abstention compared to instruct models.
4/9
This allows us to conduct a systematic study of what helps and hurts abstention performance.
3/9
This allows us to conduct a systematic study of what helps and hurts abstention performance.
3/9
We build a diverse benchmark spanning 6 abstention scenarios (underspecification, staleness, …) and various domains (medicine, social bias, …).
We build a diverse benchmark spanning 6 abstention scenarios (underspecification, staleness, …) and various domains (medicine, social bias, …).
Key finding: reasoning LLMs struggle with unanswerable questions and hallucinate!
Paper: arxiv.org/abs/2506.09038
Code: github.com/facebookrese...
🧵1/9
Key finding: reasoning LLMs struggle with unanswerable questions and hallucinate!
Paper: arxiv.org/abs/2506.09038
Code: github.com/facebookrese...
🧵1/9
📅 Wednesday, June 11, 9am-6pm
📍 room 213 (main session) + Hall D (poster sessions), the Music City Center
We have an amazing lineup of speakers and panelists! Can't wait to meet you all there :)
📅 Wednesday, June 11, 9am-6pm
📍 room 213 (main session) + Hall D (poster sessions), the Music City Center
We have an amazing lineup of speakers and panelists! Can't wait to meet you all there :)
Submit your work studying various axes of demographic diversity and fairness in models and datasets and join us in Nashville in June!
Deadline: March 31st
sites.google.com/view/cvpr-20...
Submit your work studying various axes of demographic diversity and fairness in models and datasets and join us in Nashville in June!
Deadline: March 31st
sites.google.com/view/cvpr-20...