Dr. Claire Le Goues
banner
clegoues.bsky.social
Dr. Claire Le Goues
@clegoues.bsky.social
Prof@SCS@CMU, scientist, software engineer, “heartless wench”, mama. It's pronounced "Le Gwess". Mostly academia, tech/SE, PGH. She/her
We have other recent results on LLM-based security vulnerability detection that includes similarly alarming indications of leakage on BigVul, even though that’s not the main point of the study…(s/o to the student, Aidan, not On Here.)

arxiv.org/abs/2406.05892

So…yeah…we may have a problem. 😅 /fin
November 26, 2024 at 5:00 PM
📈 Llama 3.1 (70B)—trained on far more data—shows less leakage than older and smaller models like CodeGen and CodeLlama. Its higher NLL and lower 5-gram match show limited signs of leakage. 5/
November 26, 2024 at 4:17 PM
📜 5-gram match reveals memorization: We used 5-gram match to check if models generated nearly identical outputs when given the same input. CodeGen scored 82% on Defects4J. There’s no established baseline or cutoff here, but that’s eyebrow-raisingly high. 4/
November 26, 2024 at 4:15 PM
🧠 Older models memorize more: Models like CodeGen, and CodeLlama show significantly higher leakage on Defects4J than newer models (e.g., Llama 3.1). They often reproduce patches verbatim, to the point that it’s weird (including comments!!) 🔥 3/
November 26, 2024 at 4:09 PM