Sarah Wiegreffe
banner
sarah-nlp.bsky.social
Sarah Wiegreffe
@sarah-nlp.bsky.social
Research in NLP (mostly LM interpretability & explainability).
Assistant prof at UMD CS + CLIP.
Previously @ai2.bsky.social @uwnlp.bsky.social
Views my own.
sarahwie.github.io
Pinned
A bit late to announce, but I’m excited to share that I'll be starting as an assistant professor at UMD CS @univofmaryland.bsky.social this August.

I'll be recruiting PhD students this upcoming cycle for fall 2026. (And if you're a UMD grad student, sign up for my fall seminar!)
Reposted by Sarah Wiegreffe
If you're at #ICML2025, chat with me, @sarah-nlp.bsky.social, Atticus, and others at our poster 11am - 1:30pm at East #1205! We're establishing a 𝗠echanistic 𝗜nterpretability 𝗕enchmark.

We're planning to keep this a living benchmark; come by and share your ideas/hot takes!
July 17, 2025 at 5:45 PM
I am at #ICML2025! 🇨🇦🏞️
Catch me:

1️⃣ Presenting this paper👇 tomorrow 11am-1:30pm at East #1205

2️⃣ At the Actionable Interpretability @actinterp.bsky.social workshop on Saturday in East Ballroom A (I’m an organizer!)
Lots of progress in mech interp (MI) lately! But how can we measure when new mech interp methods yield real improvements over prior work?

We propose 😎 𝗠𝗜𝗕: a 𝗠echanistic 𝗜nterpretability 𝗕enchmark!
July 16, 2025 at 11:09 PM
Reposted by Sarah Wiegreffe
This week is #ICML in Vancouver, and a number of our researchers are participating. Here's the full list of Ai2's conference engagements—we look forward to connecting with fellow attendees. 👋
July 14, 2025 at 7:30 PM
A bit late to announce, but I’m excited to share that I'll be starting as an assistant professor at UMD CS @univofmaryland.bsky.social this August.

I'll be recruiting PhD students this upcoming cycle for fall 2026. (And if you're a UMD grad student, sign up for my fall seminar!)
June 13, 2025 at 6:20 PM
Reposted by Sarah Wiegreffe
🚨 We're looking for more reviewers for the workshop!
📆 Review period: May 24-June 7

If you're passionate about making interpretability useful and want to help shape the conversation, we'd love your input.

💡🔍 Self-nominate here:
docs.google.com/forms/d/e/1F...
May 20, 2025 at 12:05 AM
Checkout our new preprint/project which has been over a year in the making! This has been a very fun collaboration (and one of the biggest I've personally participated in).

@amuuueller.bsky.social @boknilev.bsky.social and other co-authors are around #ICLR2025 if you want to find out more. 😊
Lots of progress in mech interp (MI) lately! But how can we measure when new mech interp methods yield real improvements over prior work?

We propose 😎 𝗠𝗜𝗕: a 𝗠echanistic 𝗜nterpretability 𝗕enchmark!
April 25, 2025 at 2:24 AM
I'm not at #ICLR2025, but have 2 works being presented:

1) Understanding how LMs answer multiple-choice questions
- arxiv.org/abs/2407.15018
- @boknilev.bsky.social is presenting the poster *now* until 12:30 (Hall 3+Hall 2B #207)
- & w/ @oyvind-t.bsky.social @hanna-nlp.bsky.social Ashish Sabharwal
April 25, 2025 at 2:18 AM
Reposted by Sarah Wiegreffe
I'm in Singapore for ICLR to present this paper:
Tomorrow, April 26th, 10-12:30 in Hall 3+2B #236
Come check it out!

arxiv.org/abs/2504.12459
April 25, 2025 at 1:55 AM
Reposted by Sarah Wiegreffe
💡 New ICLR paper! 💡
"On Linear Representations and Pretraining Data Frequency in Language Models":

We provide an explanation for when & why linear representations form in large (or small) language models.

Led by @jackmerullo.bsky.social, w/ @nlpnoah.bsky.social & @sarah-nlp.bsky.social
April 25, 2025 at 1:55 AM
Have work on the actionable impact of interpretability findings? Consider submitting to our Actionable Interpretability workshop at ICML! See below for more info.

Website: actionable-interpretability.github.io
Deadline: May 9
🎉 Our Actionable Interpretability workshop has been accepted to #ICML2025! 🎉
> Follow @actinterp.bsky.social
> Website actionable-interpretability.github.io

@talhaklay.bsky.social @anja.re @mariusmosbach.bsky.social @sarah-nlp.bsky.social @iftenney.bsky.social

Paper submission deadline: May 9th!
April 3, 2025 at 5:58 PM
Reposted by Sarah Wiegreffe
📢 Open PhD Position in Interpretable Natural Language Processing at the Department of Computer Science, UCPH!

🗓 Application deadline is 15 January 2025.

Find more information about the position and apply here 👉 di.ku.dk/english/abou...

@apepa.bsky.social @iaugenstein.bsky.social
January 8, 2025 at 10:43 AM
Enjoying my FOMO coffee this morning
December 8, 2024 at 6:39 PM
Ok one thing I sorely need here is bookmarks
November 26, 2024 at 8:06 AM