Lightnews — Scholar-powered news

Andreas Waldis

@tresiwald.bsky.social

150 followers 640 following 7 posts

Behavioral and Internal Interpretability 🔎
Incoming PostDoc Tübingen University | PhD Student at @ukplab.bsky.social, TU Darmstadt/Hochschule Luzern

Posts Replies Media Videos

Andreas Waldis

@tresiwald.bsky.social

LMs that "know more" about toxicity are less toxic!
Our #TACL 📄 connects behavior and internals:
💠 LMs amplify toxicity beyond humans
💠 Information about toxicity peaks in lower layers
💠 Bypassing these layers increases toxicity
More details👇 #NLProc #interpretability (1/🧵)

simplified overview of our aligned probing setup, where we join the behavioral and internal evaluation of LMs' toxicity

January 27, 2026 at 1:01 PM

Reposted by Andreas Waldis

INTERPLAY Workshop@COLM '25

@interplay-workshop.bsky.social

✨ The schedule for our INTERPLAY workshop at COLM is live! ✨
🗓️ October 10th, Room 518C
🔹 Invited talks from @sarah-nlp.bsky.social John Hewitt @amuuueller.bsky.social @kmahowald.bsky.social
🔹 Paper presentations and posters
🔹 Closing roundtable discussion.

Join us in Montréal! @colmweb.org

Schedule for the INTERPLAY workshop at COLM on October 10th, Room 518C.

09:00 am: Opening
09:10 am: Invited Talks by Sarah Wiegreffe and John Hewitt
10:20 am: Paper Presentations

Lunch Break

01:00 pm: Invited Talks by Aaron Mueller and Kyle Mowhald
02:10 pm: Poster Session
03:20 pm: Roundtable Discussion
04:50 pm: Closing

October 9, 2025 at 5:30 PM

Reposted by Andreas Waldis

INTERPLAY Workshop@COLM '25

@interplay-workshop.bsky.social

Missed a spot? If you have a pre-reviewed paper from ARR or COLM that focuses on the INTERPLAY between LM internals and behavior, there is a shortcut to presenting at our @colmweb.org workshop! ✨
Join us in Montréal! 🇨🇦

CfP: shorturl.at/sBomu
OpenReview: shorturl.at/WwWhg

#nlproc #interpretability

Call for Pre-Reviewed Papers, Interplay Workshop at COLM: July 10th - submissions due. July 24th - acceptance notification. October 10th - workshop day.

July 8, 2025 at 9:06 AM

Reposted by Andreas Waldis

INTERPLAY Workshop@COLM '25

@interplay-workshop.bsky.social

Delighted that ✨Mor Geva (@megamor2.bsky.social) and ✨Anna Ivanova (@neuranna.bsky.social) will complete our speaker line-up and talk about the INTERPLAY of model internals and behavior.

Be there and submit by June 30th 📄
shorturl.at/sBomu

See you in 🇨🇦 @colmweb.org
#nlproc #interpretability

Mor Geva and Anna Ivanova will talk at the INTERPLAY workshop.

June 24, 2025 at 1:06 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news