Claudia Mamede
banner
claudiarmamede.bsky.social
Claudia Mamede
@claudiarmamede.bsky.social
SE PhD student at Carnegie Mellon University and University of Porto
Reposted by Claudia Mamede
Thrilled to announce our new work TestGenEval, a benchmark that measures unit test generation and test completion capabilities. This work was done in collaboration with the FAIR CodeGen team.

Preprint: arxiv.org/abs/2410.00752
Leaderboard: testgeneval.github.io/leaderboard....
December 19, 2024 at 8:59 PM
Reposted by Claudia Mamede
And now that we’re all here, some work!🚨 Are Large Language Models Memorizing Bug Benchmarks? 🚨
There’s growing concern that LLMs for SE are prone to data leakage, but no one has quantified it... until now. 🕵️‍♂️ 1/
arxiv.org
November 26, 2024 at 4:06 PM