Lightnews — Scholar-powered news

Anubhav Jain

@anubhavj480.bsky.social

Read more about this in our paper - arxiv.org/abs/2504.20111

Thank you to all my amazing collaborators at NYU and Sony AI!!

Forging and Removing Latent-Noise Diffusion Watermarks Using a Single Image

Watermarking techniques are vital for protecting intellectual property and preventing fraudulent use of media. Most previous watermarking schemes designed for diffusion models embed a secret key in th...

arxiv.org

April 30, 2025 at 5:37 PM

Anubhav Jain

@anubhavj480.bsky.social

We show that the same attack can also be use for watermark removal from an already watermarked generated image.

April 30, 2025 at 5:36 PM

Anubhav Jain

@anubhavj480.bsky.social

This introduces negligible noise to the original image and does not alter its semantic content at all.

April 30, 2025 at 5:36 PM

Anubhav Jain

@anubhavj480.bsky.social

We show results against the Tree-Rings, RingID, WIND and Gaussian Shading watermarking schemes and show that we can forge them with 90%+ success using a single watermarked example and a simple adversarial attack.

April 30, 2025 at 5:35 PM

Anubhav Jain

@anubhavj480.bsky.social

Our attack simply consists of perturbing the original image such that we can push it into this vulnerable region for forgery and away from it for removal.

April 30, 2025 at 5:35 PM

Anubhav Jain

@anubhavj480.bsky.social

We show that since DDIM inversion takes place with an empty prompt there is an entire region in the clean latent space which gets mapped back to the secret key embedded latent. We in-fact show that this region is linearly separable and can also be used for forgery or removal (used as motivation).

April 30, 2025 at 5:34 PM

Anubhav Jain

@anubhavj480.bsky.social

arxiv.org/abs/2412.07658

TraSCE: Trajectory Steering for Concept Erasure

Recent advancements in text-to-image diffusion models have brought them to the public spotlight, becoming widely accessible and embraced by everyday users. However, these models have been shown to gen...

arxiv.org

December 18, 2024 at 8:14 PM

Anubhav Jain

@anubhavj480.bsky.social

Many thanks to all my amazing collaborators at @sonyai.bsky.social and @nyutandon.bsky.social - Yuya Kobayashi, Takashi Shibuya, Yuhta Takida, Nasir Memon, @togelius.bsky.social and Yuki Mitsufuji.

December 18, 2024 at 8:14 PM

Anubhav Jain

@anubhavj480.bsky.social

This loss is specifically designed as a Gaussian such that unrelated concepts that are far away will not be impacted.

Our approach, TraSCE, achieves SOTA results on various jailbreaking benchmarks aimed at generating NSFW content. (5/n)

December 18, 2024 at 8:11 PM

Anubhav Jain

@anubhavj480.bsky.social

We modify the expression by guiding it using the unconditional score when this is the case.

We further propose a localized loss-based guidance to steer the diffusion trajectory away from the space pertaining to the concept we wish to erase. (4/n)

December 18, 2024 at 8:09 PM

Anubhav Jain

@anubhavj480.bsky.social

As we show, this is because conventional negative prompting has a very obvious corner case. When a user prompts the model with the same prompt as the negative prompt set by the model owner, the denoising process is guided toward the negative prompt (concept we want to erase) (3/n)

December 18, 2024 at 8:09 PM

Anubhav Jain

@anubhavj480.bsky.social

Our new method, TraSCE, is highly effective and requires no changes to the network weights and no new examples (images or prompts). It is based on negative prompting (NP) that is widely used for generating higher quality samples but hasn't been successful in concept erasure (2/n)

December 18, 2024 at 8:07 PM

Anubhav Jain

@anubhavj480.bsky.social

Looks interesting, thanks for sharing!

December 4, 2024 at 10:25 PM

Anubhav Jain

@anubhavj480.bsky.social

Long answer short - we don't know.

December 4, 2024 at 10:25 PM

Anubhav Jain

@anubhavj480.bsky.social

During memorization all initialization for the same prompt have a single or a set of attractors (closely resembling training examples). Thus, you are unlikely to fall into the corresponding attraction basin without the memorized prompt. But quantitatively, the number of memorized images can vary.

December 4, 2024 at 10:19 PM

Anubhav Jain

@anubhavj480.bsky.social

That's a good question, here is a slightly longish answer. So all outputs can be thought of as attractors where a prompt, initialization pair leads to it. However with the same prompt and different initialization the attractor changes.

December 4, 2024 at 10:17 PM

Anubhav Jain

@anubhavj480.bsky.social

Read our full paper here to find out more - arxiv.org/pdf/2411.16738

arxiv.org

December 4, 2024 at 9:05 PM

Anubhav Jain

@anubhavj480.bsky.social

We showcase that this simple approach can be applied to various models and memorization scenarios to mitigate memorization successfully.

December 4, 2024 at 9:05 PM

Anubhav Jain

@anubhavj480.bsky.social

We found that the ideal transition point corresponded with the point after the local minima when observing the magnitude of conditional guidance. Applying standard classifier-free guidance subsequently leads to high-quality non-memorized outputs.

December 4, 2024 at 9:05 PM

Anubhav Jain

@anubhavj480.bsky.social

We apply either no guidance or opposite guidance till an ideal transition point occurs. Where switching to standard classifier-free guidance is unlikely to generate a memorized image.

December 4, 2024 at 9:05 PM

Anubhav Jain

@anubhavj480.bsky.social

Successfully steering away from the attraction basin by applying either no guidance or opposite guidance in the initial time steps leads to regions in the denoising trajectory where the steering force is no longer higher than expected.

December 4, 2024 at 9:04 PM

Anubhav Jain

@anubhavj480.bsky.social

When this happens the conditional guidance becomes uncharacteristically high and steers the diffusion trajectory away from an unconditionally denoised one. We show that this high steering force is only present when the trajectory is inside the attraction basin.

December 4, 2024 at 9:04 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news