Yukyung Lee
yukyunglee.bsky.social
Yukyung Lee
@yukyunglee.bsky.social
Reposted by Yukyung Lee
Can coding agents autonomously implement AI research extensions?

We introduce RExBench, a benchmark that tests if a coding agent can implement a novel experiment based on existing research and code.

Finding: Most agents we tested had a low success rate, but there is promise!
July 2, 2025 at 3:40 PM