Lightnews — Scholar-powered news

@patqdasilva.bsky.social

8 followers 6 following 12 posts

Posts Replies Media Videos

Pinned

patqdasilva.bsky.social @patqdasilva.bsky.social · Jul 30

Super grateful to have received senior area chair highlight at #ACL2025NLP
⏳ The generalization of interpretability-based steering methods is at an inflection point
🚂 As a community, we need to place stronger emphasis on evaluating the reliability of methods if we care about long-term impact

ACL @aclmeeting.bsky.social · Jul 30

patqdasilva.bsky.social

@patqdasilva.bsky.social

ACL @aclmeeting.bsky.social · Jul 30

July 30, 2025 at 4:10 PM

patqdasilva.bsky.social

@patqdasilva.bsky.social

🌟Excited to announce that “Steering off Course” was accepted to #ACL2025NLP for an Oral and Panel Discussion! arxiv.org/abs/2504.04635
📍Wed, 9AM, Level 2 Hall A

🍁I will also share this work at Actionable Interpretability @ActInterp at #ICML2025
📍Sat, 1PM, East Ballroom A

patqdasilva.bsky.social @patqdasilva.bsky.social · Apr 8

Steering language models by directly intervening on internal activations is appealing–but does it generalize?

We study 3 popular steering methods with 36 models from 14 families (1.5-70B), exposing brittle performance and fundamental flaws in underlying assumptions
🧵👇
(1/10)

July 16, 2025 at 5:03 PM

patqdasilva.bsky.social

@patqdasilva.bsky.social

April 8, 2025 at 11:34 AM

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news