Rohit Gandikota
rohitgandikota.bsky.social
Rohit Gandikota
@rohitgandikota.bsky.social
Ph.D. AI @ Northeastern University. Understanding, mapping, and editing knowledge in large generative models. Ex-Scientist Indian Space Research Organization
Try the code yourself - it doesn't take much time and doesn't require a big GPU too!!

Paper: arxiv.org/abs/2308.14761
Project: unified.baulab.info
Code: github.com/rohitgandik...

work w/ @OrgadHadas @boknilev @materzynska @davidbau
GitHub - rohitgandikota/unified-concept-editing: Unified Concept Editing in Diffusion Models
Unified Concept Editing in Diffusion Models. Contribute to rohitgandikota/unified-concept-editing development by creating an account on GitHub.
github.com
December 8, 2025 at 7:56 PM
Curious what your model really knows? Try probing it 🕵️‍♀️

Code: github.com/kevinlu4588...
Project: unerasing.baulab.info
Paper: arxiv.org/abs/2505.17013

work led by @kevinlu4588 w/ @NickyDCFP, @mnphamx1, @davidbau, @chegday and @CohNiv

@Northeastern @nyuniversity
When Are Concepts Erased From Diffusion Models?
In concept erasure, a model is modified to selectively prevent it from generating a target concept. Despite the rapid development of new methods, it remains unclear how thoroughly these approaches...
arxiv.org
December 1, 2025 at 2:50 PM
We also find a deep trade-off:
Robust methods (destruction-based🧨) tend to distort unrelated generations.

Understanding this helps researchers choose or design erasure methods that fit their needs.
December 1, 2025 at 2:50 PM
⚙️Classifier Steering: Before the popular classifier-free guidance ( @hojonathanho) came the classifier guidance ( @prafdhar)

By steering generator outputs along an external classifier's manifold, we search the knowledge of a diffusion model and bring back the erased concepts
x.com/prafdhar/st...
December 1, 2025 at 2:50 PM
🏞️ In-context attacks: Inspired by In-context Learning in LLMs, we design a similar experiment in image models with in-painting

By showing an unfinished image and asking model to finish it, we nudge it to search through its knowledge and complete the task through visual context
x.com/arankomatsu...
December 1, 2025 at 2:50 PM
🧠Training-free method: We add small amounts of noise after each denoising step, like Brownian motion in physics. We call it - Noise-based probing.

This technique reveals hidden "erased" knowledge inside most of the robust unlearnt models
December 1, 2025 at 2:50 PM
Building on this finding, @koushik_srivats proposed STEREO. Exhaustively search for soft prompts and remove knowledge traces inside the erased diffusion model.

STEREO is robust to optimization attacks, but @kevinlu4588 found a simple trick to show the hidden knowledge again!👇
x.com/koushik_sri...
December 1, 2025 at 2:50 PM
📈Optimization probes: We use a small set of images to optimize for a soft prompt that can generate the concept (Text Inversion @RinonGal et al.). @mnphamx1 found this to be a good probe to detect unerased concepts. @rhfeiyang and @materzynska confirmed this with pre-training!
x.com/materzynska...
December 1, 2025 at 2:50 PM
Real trick is finding clever ways to stimulate the model to reveal its hidden knowledge. In this work, we found several simple probes to do that!

📈Optimization-based
🧠Training-free methods
🏞️ In-context attacks
⚙️Classifier Steering

All unlearning methods show traces!
December 1, 2025 at 2:50 PM
First, let's define what "unlearning" means. We adopt the definition from ESD, a self-guided erasure method that uses the model's own knowledge, to ablate a concept

"A model with no knowledge of a concept, should never generate the concept irrespective of the input stimulus"
x.com/_akhaliq/st...
December 1, 2025 at 2:50 PM
Explainer thread: x.com/materzynska/...
x.com
x.com
December 4, 2024 at 12:46 AM