Jeff Cheng
jeff-cheng.bsky.social
Jeff Cheng
@jeff-cheng.bsky.social
masters @ jhu clsp
Reposted by Jeff Cheng
🚨 You are only evaluating a slice of your test-time scaling model's performance! 🚨

📈 We consider how models’ confidence in their answers changes as test-time compute increases. Reasoning longer helps models answer more confidently!

📝: arxiv.org/abs/2502.13962
February 20, 2025 at 3:14 PM