Lightnews — Scholar-powered news

Claudia Mamede

@claudiarmamede.bsky.social

19 followers 17 following 0 posts

SE PhD student at Carnegie Mellon University and University of Porto

Posts Replies Media Videos

Reposted by Claudia Mamede

Kush Jain

@kjain14.bsky.social

Thrilled to announce our new work TestGenEval, a benchmark that measures unit test generation and test completion capabilities. This work was done in collaboration with the FAIR CodeGen team.

Preprint: arxiv.org/abs/2410.00752
Leaderboard: testgeneval.github.io/leaderboard....

December 19, 2024 at 8:59 PM

Reposted by Claudia Mamede

Dr. Claire Le Goues

@clegoues.bsky.social

And now that we’re all here, some work!🚨 Are Large Language Models Memorizing Bug Benchmarks? 🚨
There’s growing concern that LLMs for SE are prone to data leakage, but no one has quantified it... until now. 🕵️‍♂️ 1/

arxiv.org

November 26, 2024 at 4:06 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news