Anubhav Jain
anubhavj480.bsky.social
Anubhav Jain
@anubhavj480.bsky.social
PhD Candidate @ NYU
Read more about this in our paper - arxiv.org/abs/2504.20111

Thank you to all my amazing collaborators at NYU and Sony AI!!
Forging and Removing Latent-Noise Diffusion Watermarks Using a Single Image
Watermarking techniques are vital for protecting intellectual property and preventing fraudulent use of media. Most previous watermarking schemes designed for diffusion models embed a secret key in th...
arxiv.org
April 30, 2025 at 5:37 PM
We show that the same attack can also be use for watermark removal from an already watermarked generated image.
April 30, 2025 at 5:36 PM
This introduces negligible noise to the original image and does not alter its semantic content at all.
April 30, 2025 at 5:36 PM
We show results against the Tree-Rings, RingID, WIND and Gaussian Shading watermarking schemes and show that we can forge them with 90%+ success using a single watermarked example and a simple adversarial attack.
April 30, 2025 at 5:35 PM
Our attack simply consists of perturbing the original image such that we can push it into this vulnerable region for forgery and away from it for removal.
April 30, 2025 at 5:35 PM
We show that since DDIM inversion takes place with an empty prompt there is an entire region in the clean latent space which gets mapped back to the secret key embedded latent. We in-fact show that this region is linearly separable and can also be used for forgery or removal (used as motivation).
April 30, 2025 at 5:34 PM
Many thanks to all my amazing collaborators at @sonyai.bsky.social and @nyutandon.bsky.social - Yuya Kobayashi, Takashi Shibuya, Yuhta Takida, Nasir Memon, @togelius.bsky.social and Yuki Mitsufuji.
December 18, 2024 at 8:14 PM
This loss is specifically designed as a Gaussian such that unrelated concepts that are far away will not be impacted.

Our approach, TraSCE, achieves SOTA results on various jailbreaking benchmarks aimed at generating NSFW content. (5/n)
December 18, 2024 at 8:11 PM
We modify the expression by guiding it using the unconditional score when this is the case.

We further propose a localized loss-based guidance to steer the diffusion trajectory away from the space pertaining to the concept we wish to erase. (4/n)
December 18, 2024 at 8:09 PM
As we show, this is because conventional negative prompting has a very obvious corner case. When a user prompts the model with the same prompt as the negative prompt set by the model owner, the denoising process is guided toward the negative prompt (concept we want to erase) (3/n)
December 18, 2024 at 8:09 PM
Our new method, TraSCE, is highly effective and requires no changes to the network weights and no new examples (images or prompts). It is based on negative prompting (NP) that is widely used for generating higher quality samples but hasn't been successful in concept erasure (2/n)
December 18, 2024 at 8:07 PM
Looks interesting, thanks for sharing!
December 4, 2024 at 10:25 PM
Long answer short - we don't know.
December 4, 2024 at 10:25 PM
During memorization all initialization for the same prompt have a single or a set of attractors (closely resembling training examples). Thus, you are unlikely to fall into the corresponding attraction basin without the memorized prompt. But quantitatively, the number of memorized images can vary.
December 4, 2024 at 10:19 PM
That's a good question, here is a slightly longish answer. So all outputs can be thought of as attractors where a prompt, initialization pair leads to it. However with the same prompt and different initialization the attractor changes.
December 4, 2024 at 10:17 PM
Read our full paper here to find out more - arxiv.org/pdf/2411.16738
arxiv.org
December 4, 2024 at 9:05 PM
We showcase that this simple approach can be applied to various models and memorization scenarios to mitigate memorization successfully.
December 4, 2024 at 9:05 PM
We found that the ideal transition point corresponded with the point after the local minima when observing the magnitude of conditional guidance. Applying standard classifier-free guidance subsequently leads to high-quality non-memorized outputs.
December 4, 2024 at 9:05 PM
We apply either no guidance or opposite guidance till an ideal transition point occurs. Where switching to standard classifier-free guidance is unlikely to generate a memorized image.
December 4, 2024 at 9:05 PM
Successfully steering away from the attraction basin by applying either no guidance or opposite guidance in the initial time steps leads to regions in the denoising trajectory where the steering force is no longer higher than expected.
December 4, 2024 at 9:04 PM
When this happens the conditional guidance becomes uncharacteristically high and steers the diffusion trajectory away from an unconditionally denoised one. We show that this high steering force is only present when the trajectory is inside the attraction basin.
December 4, 2024 at 9:04 PM