Lightnews — Scholar-powered news

tomerashuach.bsky.social

@tomerashuach.bsky.social

🚀 Impact: As LMs become ubiquitous, protecting privacy while maintaining utility is crucial.
REVS offers a practical solution for post-hoc removal of sensitive information.

📄Paper: technion-cs-nlp.github.io/REVS/
👨‍💻Code: github.com/Tomertech/REVS

#Unlearning #NLProc #ACL2025NLP
8/8

May 27, 2025 at 8:19 AM

tomerashuach.bsky.social

@tomerashuach.bsky.social

🛡️ Extraction Resistance: REVS is more robust against sophisticated attacks:
-Logit-lens attacks
-Delta attacks
-Perturbation attacks
Critical for real-world deployment where adversaries actively try to extract "unlearned" info.
7/8

May 27, 2025 at 8:19 AM

tomerashuach.bsky.social

@tomerashuach.bsky.social

🏆Results: REVS outperforms 6 strong baselines across all metrics:
✅Superior unlearning effectiveness
✅Better model integrity preservation
✅Stronger resistance to extraction attacks
✅Robust across different hyperparameters
6/8

May 27, 2025 at 8:19 AM

tomerashuach.bsky.social

@tomerashuach.bsky.social

📊Evaluation: We curated 3 datasets with actual sensitive information:
Emails & URLs naturally memorized by Llama-3-8B & GPT-J-6B
Synthetic SSN dataset where we induced memorization
Real sensitive data = real evaluation!
5/8

May 27, 2025 at 8:19 AM

tomerashuach.bsky.social

@tomerashuach.bsky.social

🔬How REVS Works:
1. Localization: Find layers & neurons most responsible for generating target tokens
2. Editing: Modify neurons in vocabulary space to demote sensitive tokens
3. Preservation: Keep general model knowledge intact
All without gradients!
4/8

May 27, 2025 at 8:19 AM

tomerashuach.bsky.social

@tomerashuach.bsky.social

💡Our Solution - REVS: A novel non-gradient method that surgically removes sensitive info while preserving model capabilities.
Key insight: We identify neurons that promote sensitive tokens in vocabulary space and modify them to demote those tokens to lower ranks.
3/8

May 27, 2025 at 8:19 AM

tomerashuach.bsky.social

@tomerashuach.bsky.social

🔎 The Problem:
LMs can regurgitate private info from training.
Prompt: "Contact David Lewis at" → "lewis.david@email.com"
This violates privacy regulations like GDPR and poses serious security risks.
2/8

May 27, 2025 at 8:19 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news