🤖AgentBaselines makes them reusable, incl. our SOTA science agents: github.com/allenai/agent-baselines
📚Blog: allenai.org/blog/astabench
📄Paper: arxiv.org/abs/2510.21652
📊Leaderboard: huggingface.co/spaces/allenai/asta-bench-leaderboard
🤖AgentBaselines makes them reusable, incl. our SOTA science agents: github.com/allenai/agent-baselines
📚Blog: allenai.org/blog/astabench
📄Paper: arxiv.org/abs/2510.21652
📊Leaderboard: huggingface.co/spaces/allenai/asta-bench-leaderboard
📊Our leaderboard highlights agents that use these tools, enabling more controlled measurement of *AI*. (We measure LLM costs too.)
📊Our leaderboard highlights agents that use these tools, enabling more controlled measurement of *AI*. (We measure LLM costs too.)