🏆Claude-3.5-Sonnet is insanely good on WorkArena L2
🪨 WorkArena L3 is insanely hard
🤖o1-mini is quite good across many benchmarks
💲o1 is very expensive :)
See the leaderboard:
huggingface.co/spaces/Servi...
🏆Claude-3.5-Sonnet is insanely good on WorkArena L2
🪨 WorkArena L3 is insanely hard
🤖o1-mini is quite good across many benchmarks
💲o1 is very expensive :)
See the leaderboard:
huggingface.co/spaces/Servi...
📃https://arxiv.org/abs/2412.05467
Or our open-source tools:
🤖https://github.com/ServiceNow/AgentLab
💪https://github.com/ServiceNow/BrowserGym
🎯https://github.com/ServiceNow/WorkArena
📃https://arxiv.org/abs/2412.05467
Or our open-source tools:
🤖https://github.com/ServiceNow/AgentLab
💪https://github.com/ServiceNow/BrowserGym
🎯https://github.com/ServiceNow/WorkArena