Ameya Godbole
ameyagodbole.bsky.social
Ameya Godbole
@ameyagodbole.bsky.social
PhD student USC NLP working on generalization and reasoning, prev UMassAmherst, IITG (he/him)
Hubble enables a wide range of memorization research. Analyzing the inserted biographies 🧑‍💼 alone yields rich insights, and e.g. reveals how readily different types of PII are memorized.

And there’s a lot more — book passages 📚, paraphrases 🔁, chat logs 💬, and test sets🎯
October 24, 2025 at 6:21 PM
🪐Our core release is 8 runs:
2 data conditions (standard, perturbed) ×2 model sizes (1B, 8B) ×2 pretraining sizes (100B, 500B).

They establish *dilution* as a best practice to broadly address memorization risks — sensitive data can be diluted by scaling up the training corpus!
October 24, 2025 at 6:21 PM
Announcing 🔭Hubble, a suite of open-source LLMs to advance the study of memorization!

Pretrained 1B/8B param models, with controlled insertion of texts designed to emulate key memorization risks: copyright (e.g., book passages), privacy (e.g., synthetic biographies), and test set contamination
October 24, 2025 at 6:21 PM