Jaydeep Borkar
banner
jaydeepborkar.bsky.social
Jaydeep Borkar
@jaydeepborkar.bsky.social
Visiting Researcher at Meta NYC🦙 and PhD student at Northeastern. Organizer at the Trustworthy ML Initiative (trustworthyml.org). s&p in language models + mountain biking.

jaydeepborkar.github.io
Very excited to be joining Meta GenAI as a Visiting Researcher starting this June in New York City!🗽 I’ll be continuing my work on studying memorization and safety in language models.

If you’re in NYC and would like to hang out, please message me :)
May 15, 2025 at 3:18 AM
Really liked this slide by @afedercooper.bsky.social on categorizing extraction vs regurgitation vs memorization of training data at CS&Law today!
March 25, 2025 at 9:11 PM
*Takeaway*: these results underscore the need for more holistic memorization audits, where examples that aren’t extracted at a particular time point are also evaluated for any potential risks. E.g., we find that multiple models have equal or more assisted memorization.
March 2, 2025 at 7:20 PM
—extends to LLMs: removing one layer of memorized PII exposes a 2nd layer, & so forth. We find this to be true even for random removals (which simulate opt-out requests). PII on the verge of memorization surfaces after others are removed.
March 2, 2025 at 7:20 PM
We find that removing extracted PII from the data & re-finetuning from scratch leads to the extraction of other PII. However, this phenomenon stops after certain iterations. Our results confirm that this layered memorization—termed the Onion Effect (Carlini et al. 2022)…
March 2, 2025 at 7:20 PM
We find that: 1) extraction increases substantially with the amount of PII contained in the model’s training set, & 2) inclusion of more PII leads to existing PII being at higher risk of extraction. This effect can increase extraction by over 7× in our setting.
March 2, 2025 at 7:20 PM
We run various tests to characterize the underlying reason for assisted memorization. We causally remove overlapping n-grams based on our methodology and find a strong correlation.
March 2, 2025 at 7:20 PM
The literature so far lacks a clear understanding of the complete memorization landscape throughout training. In this work, we provide a complete taxonomy & uncover novel forms of memorization that arise during training.
March 2, 2025 at 7:20 PM
We observe a phenomenon we call assisted memorization: we find that most PII (email ID) isn’t extracted after it is first seen. But fine-tuning further on data that contains n-grams that overlap with these PII finally leads to their extraction. This is a key factor (in our settings, up to 1/3).
March 2, 2025 at 7:20 PM
What happens if we fine-tune an LLM on more PII? We find that PII that wasn’t previously extracted gets extracted after fine-tuning on *other* PII. This could have implications for earlier seen data (e.g. during post-training or further fine-tuning). 🧵

paper: arxiv.org/pdf/2502.15680
March 2, 2025 at 7:20 PM
—extends to LLMs: removing one layer of memorized PII exposes a 2nd layer, & so forth. We find this to be true even for random removals (which simulate opt-out requests). PII on the verge of memorization surfaces after others are removed.
March 2, 2025 at 7:11 PM
We find that removing extracted PII from the data & re-finetuning from scratch leads to the extraction of other PII. However, this phenomenon stops after certain iterations. Our results confirm that this layered memorization—termed the Onion Effect (Carlini et al. 2022) …
March 2, 2025 at 7:11 PM
We find that: 1) extraction increases substantially with the amount of PII contained in the model’s training set, & 2) inclusion of more PII leads to existing PII being at higher risk of extraction. This effect can increase extraction by over 7× in our setting.
March 2, 2025 at 7:11 PM
We run various tests to characterize the underlying reason for assisted memorization. We causally remove overlapping n-grams based on our methodology and find a strong correlation.
March 2, 2025 at 7:11 PM
The literature so far lacks a clear understanding of the complete memorization landscape throughout training. In this work, we provide a complete taxonomy & uncover novel forms of memorization that arise during training.
March 2, 2025 at 7:11 PM