Jonathan Bragg
jbragg.bsky.social
Jonathan Bragg
@jbragg.bsky.social
Leading agents R&D at AI2. AI & HCI research scientist. https://jonathanbragg.com
Brooke Vlahos, Peter Clark, Doug Downey, @yoavgo.bsky.social Ashish Sabharwal, Daniel S. Weld
November 6, 2025 at 5:01 PM
Amanpreet Singh, Harshit Surana, Aryeh Tiktinsky, Rosni Vasu @guywiener.bsky.social Chloe Anastasiades, Stefan Candra, Jason Dunkelberger, Dan Emery, Rob Evans, Malachi Hamada, Regan Huff, Rodney Kinney, Matt Latzke, Jaron Lochner, Ruben Lozano-Aguilera, Cecile Nguyen, Smita Rao, Amber Tanaka...
November 6, 2025 at 5:01 PM
🙏 Many thanks to my @ai2.bsky.social teammates—Mike D’Arcy @nbalepur.bsky.social Dan Bareket, Bhavana Dalvi @sergeyf.bsky.social Dany Haddad, Jena D. Hwang, @peterjansen-ai.bsky.social Varsha Kishore, Bodhisattwa Majumder @arnaik19.bsky.social Sigal Rahamimov, Kyle Richardson...
November 6, 2025 at 5:01 PM
We tested 22 agent classes—more *kinds* than other benchmarks

🤖AgentBaselines makes them reusable, incl. our SOTA science agents: github.com/allenai/agent-baselines

📚Blog: allenai.org/blog/astabench
📄Paper: arxiv.org/abs/2510.21652
📊Leaderboard: huggingface.co/spaces/allenai/asta-bench-leaderboard
GitHub - allenai/agent-baselines
Contribute to allenai/agent-baselines development by creating an account on GitHub.
github.com
November 6, 2025 at 5:01 PM
🛠️AstaBench is the first to provide reproducible (date-limited) large-scale search tools—plus a full scientific research environment for agents.

📊Our leaderboard highlights agents that use these tools, enabling more controlled measurement of *AI*. (We measure LLM costs too.)
November 6, 2025 at 5:01 PM
@kylelo.bsky.social your gifs are an unapproved manipulation of my human attention
October 9, 2025 at 9:06 PM