Lightnews — Scholar-powered news

Rohit Gandikota

@rohitgandikota.bsky.social

Ph.D. AI @ Northeastern University. Understanding, mapping, and editing knowledge in large generative models. Ex-Scientist Indian Space Research Organization

Posts Replies Media Videos

Rohit Gandikota

@rohitgandikota.bsky.social

Try the code yourself - it doesn't take much time and doesn't require a big GPU too!!

Paper: arxiv.org/abs/2308.14761
Project: unified.baulab.info
Code: github.com/rohitgandik...

work w/ @OrgadHadas @boknilev @materzynska @davidbau

GitHub - rohitgandikota/unified-concept-editing: Unified Concept Editing in Diffusion Models

Unified Concept Editing in Diffusion Models. Contribute to rohitgandikota/unified-concept-editing development by creating an account on GitHub.

github.com

December 8, 2025 at 7:56 PM

Rohit Gandikota

@rohitgandikota.bsky.social

Curious what your model really knows? Try probing it 🕵️‍♀️

Code: github.com/kevinlu4588...
Project: unerasing.baulab.info
Paper: arxiv.org/abs/2505.17013

work led by @kevinlu4588 w/ @NickyDCFP, @mnphamx1, @davidbau, @chegday and @CohNiv

@Northeastern @nyuniversity

When Are Concepts Erased From Diffusion Models?

In concept erasure, a model is modified to selectively prevent it from generating a target concept. Despite the rapid development of new methods, it remains unclear how thoroughly these approaches...

arxiv.org

December 1, 2025 at 2:50 PM

Rohit Gandikota

@rohitgandikota.bsky.social

We also find a deep trade-off:
Robust methods (destruction-based🧨) tend to distort unrelated generations.

Understanding this helps researchers choose or design erasure methods that fit their needs.

December 1, 2025 at 2:50 PM

Rohit Gandikota

@rohitgandikota.bsky.social

⚙️Classifier Steering: Before the popular classifier-free guidance ( @hojonathanho) came the classifier guidance ( @prafdhar)

By steering generator outputs along an external classifier's manifold, we search the knowledge of a diffusion model and bring back the erased concepts
x.com/prafdhar/st...

December 1, 2025 at 2:50 PM

Rohit Gandikota

@rohitgandikota.bsky.social

🏞️ In-context attacks: Inspired by In-context Learning in LLMs, we design a similar experiment in image models with in-painting

By showing an unfinished image and asking model to finish it, we nudge it to search through its knowledge and complete the task through visual context
x.com/arankomatsu...

December 1, 2025 at 2:50 PM

Rohit Gandikota

@rohitgandikota.bsky.social

🧠Training-free method: We add small amounts of noise after each denoising step, like Brownian motion in physics. We call it - Noise-based probing.

This technique reveals hidden "erased" knowledge inside most of the robust unlearnt models

December 1, 2025 at 2:50 PM

Rohit Gandikota

@rohitgandikota.bsky.social

Building on this finding, @koushik_srivats proposed STEREO. Exhaustively search for soft prompts and remove knowledge traces inside the erased diffusion model.

STEREO is robust to optimization attacks, but @kevinlu4588 found a simple trick to show the hidden knowledge again!👇
x.com/koushik_sri...

December 1, 2025 at 2:50 PM

Rohit Gandikota

@rohitgandikota.bsky.social

📈Optimization probes: We use a small set of images to optimize for a soft prompt that can generate the concept (Text Inversion @RinonGal et al.). @mnphamx1 found this to be a good probe to detect unerased concepts. @rhfeiyang and @materzynska confirmed this with pre-training!
x.com/materzynska...

December 1, 2025 at 2:50 PM

Rohit Gandikota

@rohitgandikota.bsky.social

Real trick is finding clever ways to stimulate the model to reveal its hidden knowledge. In this work, we found several simple probes to do that!

📈Optimization-based
🧠Training-free methods
🏞️ In-context attacks
⚙️Classifier Steering

All unlearning methods show traces!

December 1, 2025 at 2:50 PM

Rohit Gandikota

@rohitgandikota.bsky.social

First, let's define what "unlearning" means. We adopt the definition from ESD, a self-guided erasure method that uses the model's own knowledge, to ablate a concept

"A model with no knowledge of a concept, should never generate the concept irrespective of the input stimulus"
x.com/_akhaliq/st...

December 1, 2025 at 2:50 PM

Rohit Gandikota

@rohitgandikota.bsky.social

Explainer thread: x.com/materzynska/...

x.com

December 4, 2024 at 12:46 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news