Jonathan Bragg
jbragg.bsky.social
Jonathan Bragg
@jbragg.bsky.social
Leading agents R&D at AI2. AI & HCI research scientist. https://jonathanbragg.com
Agent benchmarks don't measure true *AI* advances

We built one that's hard & trustworthy:
👉 AstaBench tests agents w/ *standardized tools* on 2400+ scientific research problems
👉 SOTA results across 22 agent *classes*
👉 AgentBaselines agents suite

🆕 arxiv.org/abs/2510.21652

🧵👇
November 6, 2025 at 5:01 PM