Lightnews — Scholar-powered news

Dr. Claire Le Goues

@clegoues.bsky.social

860 followers 970 following 49 posts

Prof@SCS@CMU, scientist, software engineer, “heartless wench”, mama. It's pronounced "Le Gwess". Mostly academia, tech/SE, PGH. She/her

Posts Replies Media Videos

Dr. Claire Le Goues

@clegoues.bsky.social

We have other recent results on LLM-based security vulnerability detection that includes similarly alarming indications of leakage on BigVul, even though that’s not the main point of the study…(s/o to the student, Aidan, not On Here.)

arxiv.org/abs/2406.05892

So…yeah…we may have a problem. 😅 /fin

Loss curve training/fine-tuning vulnerability detection on BigVul and a new dataset, suggesting leakage on the former.

November 26, 2024 at 5:00 PM

Dr. Claire Le Goues

@clegoues.bsky.social

📈 Llama 3.1 (70B)—trained on far more data—shows less leakage than older and smaller models like CodeGen and CodeLlama. Its higher NLL and lower 5-gram match show limited signs of leakage. 5/

NLL ratios for Codegen 6B and Llama 3.1 70B, for different bug datasets. Evidence suggests that Llama demonstrates less memorization.

November 26, 2024 at 4:17 PM

Dr. Claire Le Goues

@clegoues.bsky.social

📜 5-gram match reveals memorization: We used 5-gram match to check if models generated nearly identical outputs when given the same input. CodeGen scored 82% on Defects4J. There’s no established baseline or cutoff here, but that’s eyebrow-raisingly high. 4/

Grouped bar chart showing 5-gram accuracy on bug benchmarks for various models. All demonstrate high accuracy especially for Defects4J, Codegen 6B especially.

November 26, 2024 at 4:15 PM

Dr. Claire Le Goues

@clegoues.bsky.social

🧠 Older models memorize more: Models like CodeGen, and CodeLlama show significantly higher leakage on Defects4J than newer models (e.g., Llama 3.1). They often reproduce patches verbatim, to the point that it’s weird (including comments!!) 🔥 3/

November 26, 2024 at 4:09 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news