Lightnews — Scholar-powered news

@patqdasilva.bsky.social

8 followers 6 following 12 posts

Posts Replies Media Videos

Pinned

patqdasilva.bsky.social @patqdasilva.bsky.social · Jul 30

Super grateful to have received senior area chair highlight at #ACL2025NLP
⏳ The generalization of interpretability-based steering methods is at an inflection point
🚂 As a community, we need to place stronger emphasis on evaluating the reliability of methods if we care about long-term impact

ACL @aclmeeting.bsky.social · Jul 30

patqdasilva.bsky.social

@patqdasilva.bsky.social

ACL @aclmeeting.bsky.social · Jul 30

July 30, 2025 at 4:10 PM

patqdasilva.bsky.social

@patqdasilva.bsky.social

🌟Excited to announce that “Steering off Course” was accepted to #ACL2025NLP for an Oral and Panel Discussion! arxiv.org/abs/2504.04635
📍Wed, 9AM, Level 2 Hall A

🍁I will also share this work at Actionable Interpretability @ActInterp at #ICML2025
📍Sat, 1PM, East Ballroom A

patqdasilva.bsky.social @patqdasilva.bsky.social · Apr 8

Steering language models by directly intervening on internal activations is appealing–but does it generalize?

We study 3 popular steering methods with 36 models from 14 families (1.5-70B), exposing brittle performance and fundamental flaws in underlying assumptions
🧵👇
(1/10)

July 16, 2025 at 5:03 PM

patqdasilva.bsky.social

@patqdasilva.bsky.social

April 8, 2025 at 11:34 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news