Lightnews — Scholar-powered news

This is a collaborative work with Manoj Kumar, Ninareh Mehrabi, Anil Ramakrishna, Anna Rumshisky, Kai-Wei Chang, Aram Galstyan, Morteza Ziyadi, Rahul Gupta

May 1, 2025 at 5:26 AM

Anubrata Das @ NAACL 2025

@anubrata.bsky.social

Causal tracing informed edits provide a better detoxification-degeneration trade-off.

May 1, 2025 at 5:25 AM

Anubrata Das @ NAACL 2025

@anubrata.bsky.social

Model editing helps reduce toxicity. High detoxification can be achieved by simply editing random MLP layers. However, this leads to degeneration and increased perplexity.

May 1, 2025 at 5:25 AM

Anubrata Das @ NAACL 2025

@anubrata.bsky.social

We find evidence of toxic memory in the early layer of GPT-2 XL for innocuous-looking adversarial prompts.

May 1, 2025 at 5:25 AM

Anubrata Das @ NAACL 2025

@anubrata.bsky.social

Paper: On Localizing and Deleting Toxic Memories in Large Language Models
Anthology URL: aclanthology.org/2025.finding...

aclanthology.org

May 1, 2025 at 5:24 AM

Anubrata Das @ NAACL 2025

@anubrata.bsky.social

Right, sorry for being unclear. I saw your comment sharing the Qualtrics integration tutorial with a video. bsky.app/profile/dggo...

Dan Goldstein @dggoldst.bsky.social · Nov 25

Tom Costello ( @tomcostello.bsky.social ) has made freely available tools that let you build LLM interaction into Qualtrics

publish.obsidian.md/qualtrics-do...

Home - Obsidian Publish

Request If you use our template (.QSF) to set up your research, we would appreciate it if you cite our paper when describing your method: Durably reducing conspiracy beliefs through dialogues with AI…

publish.obsidian.md

November 25, 2024 at 9:33 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news