Ori Yoran
oriyoran.bsky.social
Ori Yoran
@oriyoran.bsky.social
NLP PhD Candidate at Tel Aviv University (Google PhD Fellow) | Working on LLM reasoning and Language Agents | Prev. Research intern at Meta FAIR and AI2
Reposted by Ori Yoran
"One bad apple can spoil the bunch 🍎", and that's doubly true for language agents!
Our new paper shows how monitoring and intervention can prevent agents from going rogue, boosting performance by up to 20%. We're also releasing a new multi-agent environment 🕵️‍♂️
February 13, 2025 at 2:09 PM
Reposted by Ori Yoran
Thrilled to announce our new work TestGenEval, a benchmark that measures unit test generation and test completion capabilities. This work was done in collaboration with the FAIR CodeGen team.

Preprint: arxiv.org/abs/2410.00752
Leaderboard: testgeneval.github.io/leaderboard....
December 19, 2024 at 8:59 PM
Reposted by Ori Yoran
We’re really excited to release this large collaborative work for unifying web agent benchmarks under the same roof.

In this TMLR paper, we dive in-depth into #BrowserGym and #AgentLab. We also present some unexpected performances from Claude 3.5-Sonnet
December 12, 2024 at 5:55 PM
Reposted by Ori Yoran
🧵-1
We are thrilled to release #AgentLab, a new open-source package for developing and evaluating web agents. This builds on the new #BrowserGym package which supports 10 different benchmarks, including #WebArena.
December 3, 2024 at 9:02 PM