REVS offers a practical solution for post-hoc removal of sensitive information.
📄Paper: technion-cs-nlp.github.io/REVS/
👨💻Code: github.com/Tomertech/REVS
#Unlearning #NLProc #ACL2025NLP
8/8
REVS offers a practical solution for post-hoc removal of sensitive information.
📄Paper: technion-cs-nlp.github.io/REVS/
👨💻Code: github.com/Tomertech/REVS
#Unlearning #NLProc #ACL2025NLP
8/8
-Logit-lens attacks
-Delta attacks
-Perturbation attacks
Critical for real-world deployment where adversaries actively try to extract "unlearned" info.
7/8
-Logit-lens attacks
-Delta attacks
-Perturbation attacks
Critical for real-world deployment where adversaries actively try to extract "unlearned" info.
7/8
✅Superior unlearning effectiveness
✅Better model integrity preservation
✅Stronger resistance to extraction attacks
✅Robust across different hyperparameters
6/8
✅Superior unlearning effectiveness
✅Better model integrity preservation
✅Stronger resistance to extraction attacks
✅Robust across different hyperparameters
6/8
Emails & URLs naturally memorized by Llama-3-8B & GPT-J-6B
Synthetic SSN dataset where we induced memorization
Real sensitive data = real evaluation!
5/8
Emails & URLs naturally memorized by Llama-3-8B & GPT-J-6B
Synthetic SSN dataset where we induced memorization
Real sensitive data = real evaluation!
5/8
1. Localization: Find layers & neurons most responsible for generating target tokens
2. Editing: Modify neurons in vocabulary space to demote sensitive tokens
3. Preservation: Keep general model knowledge intact
All without gradients!
4/8
1. Localization: Find layers & neurons most responsible for generating target tokens
2. Editing: Modify neurons in vocabulary space to demote sensitive tokens
3. Preservation: Keep general model knowledge intact
All without gradients!
4/8
Key insight: We identify neurons that promote sensitive tokens in vocabulary space and modify them to demote those tokens to lower ranks.
3/8
Key insight: We identify neurons that promote sensitive tokens in vocabulary space and modify them to demote those tokens to lower ranks.
3/8
LMs can regurgitate private info from training.
Prompt: "Contact David Lewis at" → "lewis.david@email.com"
This violates privacy regulations like GDPR and poses serious security risks.
2/8
LMs can regurgitate private info from training.
Prompt: "Contact David Lewis at" → "lewis.david@email.com"
This violates privacy regulations like GDPR and poses serious security risks.
2/8