Alexandre Lacoste
alex-lacoste.bsky.social
Alexandre Lacoste
@alex-lacoste.bsky.social
MegaSenior Research Scientist at ServiceNow Research, Former Google. WebAgents, Remote Sensing, Climate Change, Opinions are my own
Pinned
🧵-1
We are thrilled to release #AgentLab, a new open-source package for developing and evaluating web agents. This builds on the new #BrowserGym package which supports 10 different benchmarks, including #WebArena.
🚨 Is #WorkArena on the verge of being solved? Or did GPT-5 just get trained on it?

🔥While some benchmarks show modest gains, GPT-5 is crushing WorkArena L2🔥
➡️ 69.4% avg success vs. ~40% for next best🤯
➡️ Complex tasks, up to 100 steps, 5–20 min for humans
August 21, 2025 at 6:23 PM
🚨 New benchmark drop!
[#ICCV2025] Our paper "GEOBench-VLM: Benchmarking Vision-Language Models for Geospatial Tasks" is accepted at ICCV 2025 in Honolulu, Hawaii! 🌺
Let's dive into what makes it exciting: 🧵
July 2, 2025 at 12:47 PM
March 26, 2025 at 6:50 PM
Reposted by Alexandre Lacoste
Interested in knowing more about LLMs agents and in contributing to this topic?🚀

📢We're thrilled to announce REALM: The first Workshop for Research on Agent Language Models 🤖 #ACL2025NLP in Vienna 🎻
We have an exciting lineup of speakers

🗓️ Submit your work by *March 1st*
@aclmeeting.bsky.social
January 23, 2025 at 2:29 PM
Reposted by Alexandre Lacoste
Got ideas to share and want to learn about the latest progress?

Consider submitting your work! 🔗https://realm-workshop.github.io

Organizers:
@shikharmurty.bsky.social @ehsk0.bsky.social @xhluca.bsky.social @alex-lacoste.bsky.social @hanna-nlp.bsky.social @gneubig.bsky.social
January 23, 2025 at 2:29 PM
We’re really excited to release this large collaborative work for unifying web agent benchmarks under the same roof.

In this TMLR paper, we dive in-depth into #BrowserGym and #AgentLab. We also present some unexpected performances from Claude 3.5-Sonnet
December 12, 2024 at 5:55 PM
Join us for a co-hosted Happy Hour
NeurIPS 2024
with ServiceNow and IMean.ai
as we explore the cutting edge of WebAgent development!

📅 Date: Dec 13th 6:00pm PST
📍 Location: 15min walk from Neurips see details after RSVP
🎉 RSVP Here: lu.ma/rw9x9vc6
December 12, 2024 at 4:24 PM
Very excited to see this work coming out from @servicenowresearch.bsky.social. Can't wait to test a trained model in #AgentLab
🎉 Excited to introduce BigDocs!
An open, transparent multimodal dataset designed for:
📄 Documents
🌐 Web content
🖥️ GUI understanding
👨‍💻 Code generation from images
We’re also launching BigDocs-Bench:
➡️ Document, Web, GUI Visual reasoning
➡️ Converting images into JSON, Markdown, LaTeX, SVG, and more!
December 10, 2024 at 10:55 PM
Reposted by Alexandre Lacoste
🎉 Excited to introduce BigDocs!
An open, transparent multimodal dataset designed for:
📄 Documents
🌐 Web content
🖥️ GUI understanding
👨‍💻 Code generation from images
We’re also launching BigDocs-Bench:
➡️ Document, Web, GUI Visual reasoning
➡️ Converting images into JSON, Markdown, LaTeX, SVG, and more!
December 10, 2024 at 6:34 PM
Awesome Starter Pack. Thanks @xhluca.bsky.social
I've created a starter pack of researchers working on digital agents (focusing on web, mobile and OS agents).

I am missing a lot, and many are not on bsky yet, so if I missed you or someone you know, please send me a DM with the link to a relevant paper and I will update the starter pack!
December 6, 2024 at 12:41 AM
🧵-1
We are thrilled to release #AgentLab, a new open-source package for developing and evaluating web agents. This builds on the new #BrowserGym package which supports 10 different benchmarks, including #WebArena.
December 3, 2024 at 9:02 PM