Lightnews — Scholar-powered news

Jonathan Bragg

@jbragg.bsky.social

260 followers 24 following 7 posts

Leading agents R&D at AI2. AI & HCI research scientist. https://jonathanbragg.com

Posts Replies Media Videos

Jonathan Bragg

@jbragg.bsky.social

Agent benchmarks don't measure true *AI* advances

We built one that's hard & trustworthy:
👉 AstaBench tests agents w/ *standardized tools* on 2400+ scientific research problems
👉 SOTA results across 22 agent *classes*
👉 AgentBaselines agents suite

🆕 arxiv.org/abs/2510.21652

🧵👇

AstaBench with abstract measurement icons

November 6, 2025 at 5:01 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news