Lightnews — Scholar-powered news

Alexandre Lacoste

@alex-lacoste.bsky.social

🚨 Is #WorkArena on the verge of being solved? Or did GPT-5 just get trained on it?

🔥While some benchmarks show modest gains, GPT-5 is crushing WorkArena L2🔥
➡️ 69.4% avg success vs. ~40% for next best🤯
➡️ Complex tasks, up to 100 steps, 5–20 min for humans

August 21, 2025 at 6:23 PM

Alexandre Lacoste

@alex-lacoste.bsky.social

🚨 New benchmark drop!
[#ICCV2025] Our paper "GEOBench-VLM: Benchmarking Vision-Language Models for Geospatial Tasks" is accepted at ICCV 2025 in Honolulu, Hawaii! 🌺
Let's dive into what makes it exciting: 🧵

July 2, 2025 at 12:47 PM

Alexandre Lacoste

@alex-lacoste.bsky.social

March 26, 2025 at 6:50 PM

Reposted by Alexandre Lacoste

Nouha Dziri

@nouhadziri.bsky.social

Interested in knowing more about LLMs agents and in contributing to this topic?🚀

📢We're thrilled to announce REALM: The first Workshop for Research on Agent Language Models 🤖 #ACL2025NLP in Vienna 🎻
We have an exciting lineup of speakers

🗓️ Submit your work by *March 1st*
@aclmeeting.bsky.social

January 23, 2025 at 2:29 PM

Reposted by Alexandre Lacoste

Nouha Dziri

@nouhadziri.bsky.social

Got ideas to share and want to learn about the latest progress?

Consider submitting your work! 🔗https://realm-workshop.github.io

Organizers:
@shikharmurty.bsky.social @ehsk0.bsky.social @xhluca.bsky.social @alex-lacoste.bsky.social @hanna-nlp.bsky.social @gneubig.bsky.social

January 23, 2025 at 2:29 PM

Alexandre Lacoste

@alex-lacoste.bsky.social

Just found this cool blogpost discussing #AgentLab, #BrowserGym and #TapeAgent

medium.com/@carolynduby...

How ServiceNow Delivers Production Grade AI Agents

Large Language Model(LLM) assistants such as ChatGPT have taken the world by storm and revolutionized many everyday tasks but Generative AI…

medium.com

December 13, 2024 at 4:13 PM

Alexandre Lacoste

@alex-lacoste.bsky.social

We’re really excited to release this large collaborative work for unifying web agent benchmarks under the same roof.

In this TMLR paper, we dive in-depth into #BrowserGym and #AgentLab. We also present some unexpected performances from Claude 3.5-Sonnet

December 12, 2024 at 5:55 PM

Alexandre Lacoste

@alex-lacoste.bsky.social

Join us for a co-hosted Happy Hour
NeurIPS 2024
with ServiceNow and IMean.ai
as we explore the cutting edge of WebAgent development!

📅 Date: Dec 13th 6:00pm PST
📍 Location: 15min walk from Neurips see details after RSVP
🎉 RSVP Here: lu.ma/rw9x9vc6

December 12, 2024 at 4:24 PM

Alexandre Lacoste

@alex-lacoste.bsky.social

Very excited to see this work coming out from @servicenowresearch.bsky.social. Can't wait to test a trained model in #AgentLab

Juan Rodriguez @joanrod.bsky.social · Dec 10

🎉 Excited to introduce BigDocs!
An open, transparent multimodal dataset designed for:
📄 Documents
🌐 Web content
🖥️ GUI understanding
👨‍💻 Code generation from images
We’re also launching BigDocs-Bench:
➡️ Document, Web, GUI Visual reasoning
➡️ Converting images into JSON, Markdown, LaTeX, SVG, and more!

December 10, 2024 at 10:55 PM

Reposted by Alexandre Lacoste

Juan Rodriguez

@joanrod.bsky.social

🎉 Excited to introduce BigDocs!
An open, transparent multimodal dataset designed for:
📄 Documents
🌐 Web content
🖥️ GUI understanding
👨‍💻 Code generation from images
We’re also launching BigDocs-Bench:
➡️ Document, Web, GUI Visual reasoning
➡️ Converting images into JSON, Markdown, LaTeX, SVG, and more!

December 10, 2024 at 6:34 PM

Alexandre Lacoste

@alex-lacoste.bsky.social

Awesome Starter Pack. Thanks @xhluca.bsky.social

Xing Han Lu @xhluca.bsky.social · Dec 5

I've created a starter pack of researchers working on digital agents (focusing on web, mobile and OS agents).

I am missing a lot, and many are not on bsky yet, so if I missed you or someone you know, please send me a DM with the link to a relevant paper and I will update the starter pack!

December 6, 2024 at 12:41 AM

Alexandre Lacoste

@alex-lacoste.bsky.social

🧵-1
We are thrilled to release #AgentLab, a new open-source package for developing and evaluating web agents. This builds on the new #BrowserGym package which supports 10 different benchmarks, including #WebArena.

AgentLab diagram.

The image describes AgentLab, a framework for efficient parallel experiments with agents. It highlights:

Core Agent Features:

Dynamic Prompting and a Unified LLM API for interacting with large language models.
BrowserGym Platform:

A tool for testing agents on benchmarks like WebArena, WorkArena, MiniWoB, and others.
Key Features:

Reproducibility, a Unified Leaderboard, an analysis tool called Xray, and a Dataset for sharing agent traces.
Blue elements represent AgentLab components.

December 3, 2024 at 9:02 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news