Lightnews — Scholar-powered news

Yukyung Lee

@yukyunglee.bsky.social

6 followers 4 following 0 posts

Posts Replies Media Videos

Reposted by Yukyung Lee

Sebastian Schuster

@sebschu.bsky.social

Can coding agents autonomously implement AI research extensions?

We introduce RExBench, a benchmark that tests if a coding agent can implement a novel experiment based on existing research and code.

Finding: Most agents we tested had a low success rate, but there is promise!

Screenshot of the RExBench preprint title page.

July 2, 2025 at 3:40 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news