Lightnews — Scholar-powered news

@tlsdc.bsky.social

32 followers 30 following 0 posts

Posts Replies Media Videos

Reposted

Alexandre Lacoste

@alex-lacoste.bsky.social

Notable findings:
🏆Claude-3.5-Sonnet is insanely good on WorkArena L2
🪨 WorkArena L3 is insanely hard
🤖o1-mini is quite good across many benchmarks
💲o1 is very expensive :)

See the leaderboard:
huggingface.co/spaces/Servi...

December 12, 2024 at 5:55 PM

Reposted

Alexandre Lacoste

@alex-lacoste.bsky.social

Visit our paper
📃https://arxiv.org/abs/2412.05467
Or our open-source tools:
🤖https://github.com/ServiceNow/AgentLab
💪https://github.com/ServiceNow/BrowserGym
🎯https://github.com/ServiceNow/WorkArena

December 12, 2024 at 5:55 PM

Reposted

Alexandre Lacoste

@alex-lacoste.bsky.social

We’re really excited to release this large collaborative work for unifying web agent benchmarks under the same roof.

In this TMLR paper, we dive in-depth into #BrowserGym and #AgentLab. We also present some unexpected performances from Claude 3.5-Sonnet

December 12, 2024 at 5:55 PM

Reposted

Alexandre Lacoste

@alex-lacoste.bsky.social

🧵-1
We are thrilled to release #AgentLab, a new open-source package for developing and evaluating web agents. This builds on the new #BrowserGym package which supports 10 different benchmarks, including #WebArena.

AgentLab diagram.

The image describes AgentLab, a framework for efficient parallel experiments with agents. It highlights:

Core Agent Features:

Dynamic Prompting and a Unified LLM API for interacting with large language models.
BrowserGym Platform:

A tool for testing agents on benchmarks like WebArena, WorkArena, MiniWoB, and others.
Key Features:

Reproducibility, a Unified Leaderboard, an analysis tool called Xray, and a Dataset for sharing agent traces.
Blue elements represent AgentLab components.

December 3, 2024 at 9:02 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news