Lightnews — Scholar-powered news

Tom Dörr

@tom-doerr.bsky.social

November 5, 2025 at 2:17 PM

@tmlr-pub.bsky.social

The BrowserGym Ecosystem for Web Agent Research

Thibault Le Sellier de Chezelles, Maxime Gasse, Alexandre Lacoste et al.

Action editor: Lingpeng Kong

https://openreview.net/forum?id=5298fKGmv3

#agentlab #agent #agents

April 7, 2025 at 12:07 AM

Xing Han Lu

@xhluca.bsky.social

WebArena by Zhou et al; AgentLab and Browsergym by @servicenow.bsky.social allowed us to explore the latest agents; @gradio-hf.bsky.social enabled us to design UIs for implementing our ARIA framework, whereas @hf.co provided a hosting platform for 100GB+ artifacts.

bsky.app/profile/xhlu...

Xing Han Lu @xhluca.bsky.social · Mar 10

Agents like OpenAI Operator can solve complex computer tasks, but what happens when users use them to cause harm, e.g. spread misinformation?

To find out, we introduce SafeArena (safearena.github.io), a benchmark to assess the capabilities of web agents to complete harmful web tasks. A thread 👇

March 10, 2025 at 5:45 PM

TMLR Published Papers

@tmlr-pub.bsky.social

New #Expert Certification:

The BrowserGym Ecosystem for Web Agent Research

Thibault Le Sellier de Chezelles, Maxime Gasse, Alexandre Lacoste et al.

https://openreview.net/forum?id=5298fKGmv3

#agentlab #agent #agents

March 9, 2025 at 1:12 AM

Xing Han Lu

@xhluca.bsky.social

Really glad that this work is out!

Agentlab and browsergym will be, in my opinion, very important components of web agent research and will play an important role in the toolkit of most web agent researchers.

Read the paper if you are interested in learning more about what the platform covers!

Alexandre Lacoste @alex-lacoste.bsky.social · Dec 12

We’re really excited to release this large collaborative work for unifying web agent benchmarks under the same roof.

In this TMLR paper, we dive in-depth into #BrowserGym and #AgentLab. We also present some unexpected performances from Claude 3.5-Sonnet

December 14, 2024 at 1:16 AM

Alexandre Lacoste

@alex-lacoste.bsky.social

Just found this cool blogpost discussing #AgentLab, #BrowserGym and #TapeAgent

medium.com/@carolynduby...

How ServiceNow Delivers Production Grade AI Agents

Large Language Model(LLM) assistants such as ChatGPT have taken the world by storm and revolutionized many everyday tasks but Generative AI…

medium.com

December 13, 2024 at 4:13 PM

Alexandre Lacoste

@alex-lacoste.bsky.social

Visit our paper
📃https://arxiv.org/abs/2412.05467
Or our open-source tools:
🤖https://github.com/ServiceNow/AgentLab
💪https://github.com/ServiceNow/BrowserGym
🎯https://github.com/ServiceNow/WorkArena

December 12, 2024 at 5:55 PM

Alexandre Lacoste

@alex-lacoste.bsky.social

We’re really excited to release this large collaborative work for unifying web agent benchmarks under the same roof.

In this TMLR paper, we dive in-depth into #BrowserGym and #AgentLab. We also present some unexpected performances from Claude 3.5-Sonnet

December 12, 2024 at 5:55 PM

Alexandre Lacoste

@alex-lacoste.bsky.social

Very excited to see this work coming out from @servicenowresearch.bsky.social. Can't wait to test a trained model in #AgentLab

Juan Rodriguez @joanrod.bsky.social · Dec 10

🎉 Excited to introduce BigDocs!
An open, transparent multimodal dataset designed for:
📄 Documents
🌐 Web content
🖥️ GUI understanding
👨‍💻 Code generation from images
We’re also launching BigDocs-Bench:
➡️ Document, Web, GUI Visual reasoning
➡️ Converting images into JSON, Markdown, LaTeX, SVG, and more!

December 10, 2024 at 10:55 PM

SmolKaiju

@smolkaiju.bsky.social

Oops sorry it’s actually called AgentLab but it’s from the same people.

From what I understand it’s a way to run parallel experiments using benchmarks for web agents.

I’m not familiar with agent benchmarks so I’m not sure how good the included ones are.

December 8, 2024 at 1:13 AM

Éric Gaubert

@kalydeoo.bsky.social

Découvrez AgentLab, un framework open source pour le développement et l’évaluation des agents Web

➡️ www.actuia.com/actualite/ag... via @ActuIA

AgentLab, un framework open source pour le développement et l’évaluation des agents Web

Lancé par ServiceNow, AgentLab est un framework open source visant à faciliter le développement et l'évaluation d'agents Web. Son objectif principal est

www.actuia.com

December 7, 2024 at 9:00 AM

drtimos.bsky.social

@drtimos.bsky.social

Thoughts on this? >> ServiceNow Releases AgentLab: A New Open-Source Python
Package for Developing and Evaluating Web Agents: Developing web agents is a challenging area of AI research that has attracted significant attention in recent… >> Comment below! #mhealth #AI #IoT #healthtech #industry40

ServiceNow Releases AgentLab: A New Open-Source Python Package for Developing and Evaluating Web Agents

Developing web agents is a challenging area of AI research that has attracted significant attention in recent years. As the web becomes more dynamic and complex, it demands advanced capabilities from agents that interact autonomously with online…

dlvr.it

December 5, 2024 at 8:17 AM

Alexandre Lacoste

@alex-lacoste.bsky.social

🔍 Analyse your agent's behavior using AgentLab-XRay, a custom UI allowing you to navigate all your experiments.

December 3, 2024 at 9:02 PM

Alexandre Lacoste

@alex-lacoste.bsky.social

AgentLab: github.com/ServiceNow/AgentLab/
🚀 Easy large-scale parallel agent experiments
🔧 Building blocks for crafting agents over BrowserGym
🤖 Unified LLM API for seamless integration
🔁 Reproducibility features for consistent results
🏆 Unified Leaderboard across multiple benchmarks

December 3, 2024 at 9:02 PM

Alexandre Lacoste

@alex-lacoste.bsky.social

🧵-1
We are thrilled to release #AgentLab, a new open-source package for developing and evaluating web agents. This builds on the new #BrowserGym package which supports 10 different benchmarks, including #WebArena.

AgentLab diagram.

The image describes AgentLab, a framework for efficient parallel experiments with agents. It highlights:

Core Agent Features:

Dynamic Prompting and a Unified LLM API for interacting with large language models.
BrowserGym Platform:

A tool for testing agents on benchmarks like WebArena, WorkArena, MiniWoB, and others.
Key Features:

Reproducibility, a Unified Leaderboard, an analysis tool called Xray, and a Dataset for sharing agent traces.
Blue elements represent AgentLab components.

December 3, 2024 at 9:02 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news