Anubrata Das @ NAACL 2025
anubrata.bsky.social
Anubrata Das @ NAACL 2025
@anubrata.bsky.social
Just Finished PhD @ UT Austin; Human-Centered NLP. Language Models

https://anubrata.github.io
Ah that makes sense! Thanks, yeah I am on that slack, hhh!
August 27, 2025 at 2:06 PM
How can I get an invite for the XAI discord?
August 27, 2025 at 1:12 PM
Thank you for making the list, could you please add me?
July 29, 2025 at 1:31 PM
Session detail:

Poster Session 5 - IAM: Interpretability and Analysis of Models for NLP, Hall 3
May 1, 2025 at 5:27 AM
This is a collaborative work with Manoj Kumar, Ninareh Mehrabi, Anil Ramakrishna, Anna Rumshisky, Kai-Wei Chang, Aram Galstyan, Morteza Ziyadi, Rahul Gupta
May 1, 2025 at 5:26 AM
Causal tracing informed edits provide a better detoxification-degeneration trade-off.
May 1, 2025 at 5:25 AM
Model editing helps reduce toxicity. High detoxification can be achieved by simply editing random MLP layers. However, this leads to degeneration and increased perplexity.
May 1, 2025 at 5:25 AM
We find evidence of toxic memory in the early layer of GPT-2 XL for innocuous-looking adversarial prompts.
May 1, 2025 at 5:25 AM
Paper: On Localizing and Deleting Toxic Memories in Large Language Models
Anthology URL: aclanthology.org/2025.finding...
aclanthology.org
May 1, 2025 at 5:24 AM
Right, sorry for being unclear. I saw your comment sharing the Qualtrics integration tutorial with a video. bsky.app/profile/dggo...
November 25, 2024 at 9:33 PM