Lightnews — Scholar-powered news

@tomerashuach.bsky.social

7 followers 13 following 8 posts

Posts Replies Media Videos

Reposted

Dana Arad

@danaarad.bsky.social

Tried steering with SAEs and found that not all features behave as expected?

Check out our new preprint - "SAEs Are Good for Steering - If You Select the Right Features" 🧵

May 27, 2025 at 4:06 PM

tomerashuach.bsky.social

@tomerashuach.bsky.social

🚨New paper at #ACL2025 Findings!
REVS: Unlearning Sensitive Information in LMs via Rank Editing in the Vocabulary Space.
LMs memorize and leak sensitive data—emails, SSNs, URLs from their training.
We propose a surgical method to unlearn it.
🧵👇w/ @boknilev.bsky.social @mtutek.bsky.social
1/8

May 27, 2025 at 8:19 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news