tomerashuach.bsky.social
@tomerashuach.bsky.social
🛡️ Extraction Resistance: REVS is more robust against sophisticated attacks:
-Logit-lens attacks
-Delta attacks
-Perturbation attacks
Critical for real-world deployment where adversaries actively try to extract "unlearned" info.
7/8
May 27, 2025 at 8:19 AM
🏆Results: REVS outperforms 6 strong baselines across all metrics:
✅Superior unlearning effectiveness
✅Better model integrity preservation
✅Stronger resistance to extraction attacks
✅Robust across different hyperparameters
6/8
May 27, 2025 at 8:19 AM
🔬How REVS Works:
1. Localization: Find layers & neurons most responsible for generating target tokens
2. Editing: Modify neurons in vocabulary space to demote sensitive tokens
3. Preservation: Keep general model knowledge intact
All without gradients!
4/8
May 27, 2025 at 8:19 AM
🔎 The Problem:
LMs can regurgitate private info from training.
Prompt: "Contact David Lewis at" → "lewis.david@email.com"
This violates privacy regulations like GDPR and poses serious security risks.
2/8
May 27, 2025 at 8:19 AM
🚨New paper at #ACL2025 Findings!
REVS: Unlearning Sensitive Information in LMs via Rank Editing in the Vocabulary Space.
LMs memorize and leak sensitive data—emails, SSNs, URLs from their training.
We propose a surgical method to unlearn it.
🧵👇w/ @boknilev.bsky.social @mtutek.bsky.social
1/8
May 27, 2025 at 8:19 AM