Lightnews — Scholar-powered news

tomerashuach.bsky.social

@tomerashuach.bsky.social

🛡️ Extraction Resistance: REVS is more robust against sophisticated attacks:
-Logit-lens attacks
-Delta attacks
-Perturbation attacks
Critical for real-world deployment where adversaries actively try to extract "unlearned" info.
7/8

May 27, 2025 at 8:19 AM

tomerashuach.bsky.social

@tomerashuach.bsky.social

🏆Results: REVS outperforms 6 strong baselines across all metrics:
✅Superior unlearning effectiveness
✅Better model integrity preservation
✅Stronger resistance to extraction attacks
✅Robust across different hyperparameters
6/8

May 27, 2025 at 8:19 AM

tomerashuach.bsky.social

@tomerashuach.bsky.social

🔬How REVS Works:
1. Localization: Find layers & neurons most responsible for generating target tokens
2. Editing: Modify neurons in vocabulary space to demote sensitive tokens
3. Preservation: Keep general model knowledge intact
All without gradients!
4/8

May 27, 2025 at 8:19 AM

tomerashuach.bsky.social

@tomerashuach.bsky.social

🔎 The Problem:
LMs can regurgitate private info from training.
Prompt: "Contact David Lewis at" → "lewis.david@email.com"
This violates privacy regulations like GDPR and poses serious security risks.
2/8

May 27, 2025 at 8:19 AM

tomerashuach.bsky.social

@tomerashuach.bsky.social

🚨New paper at #ACL2025 Findings!
REVS: Unlearning Sensitive Information in LMs via Rank Editing in the Vocabulary Space.
LMs memorize and leak sensitive data—emails, SSNs, URLs from their training.
We propose a surgical method to unlearn it.
🧵👇w/ @boknilev.bsky.social @mtutek.bsky.social
1/8

May 27, 2025 at 8:19 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news