Alexandre Lacoste
alex-lacoste.bsky.social
Alexandre Lacoste
@alex-lacoste.bsky.social
MegaSenior Research Scientist at ServiceNow Research, Former Google. WebAgents, Remote Sensing, Climate Change, Opinions are my own
What is your guess? Why is GPT-5 shining so much on WorkArena in contrast to other benchmarks?

Trust me, this is the last time, we're making a benchmark without a hidden test set.
August 21, 2025 at 6:23 PM
🔍 Analyse your agent's behavior using AgentLab-XRay, a custom UI allowing you to navigate all your experiments.
December 3, 2024 at 9:02 PM
AgentLab: github.com/ServiceNow/AgentLab/
🚀 Easy large-scale parallel agent experiments
🔧 Building blocks for crafting agents over BrowserGym
🤖 Unified LLM API for seamless integration
🔁 Reproducibility features for consistent results
🏆 Unified Leaderboard across multiple benchmarks
December 3, 2024 at 9:02 PM