Lightnews — Scholar-powered news

Chris Wendler

@wendlerc.bsky.social

We also have a website sdxl-unbox.epfl.ch
and a paper arxiv.org/abs/2410.22366

Unboxing SDXL Turbo with SAEs

Sparse Autoencoders (SAEs) find interpretable features in Stable Diffusion Turbo and enable fine-grained image editing.

sdxl-unbox.epfl.ch

March 21, 2025 at 7:39 PM

Chris Wendler

@wendlerc.bsky.social

Huge shoutout to Viacheslav Surkov who executed this project! This is what can happen when you keep pushing on your course project :P Really amazing!

March 21, 2025 at 7:39 PM

Chris Wendler

@wendlerc.bsky.social

If you go and play with this app you are guaranteed to find some fascinating quirk about SDXL turbo that no-one has ever seen before, which is why I love this work!

March 21, 2025 at 7:39 PM

Chris Wendler

@wendlerc.bsky.social

If you are someone who is great at designing user interfaces and want to build a better app or website with us, reach out to me via DM.

If you are someone just curious about deep learning and diffusion models, go play with the features. We have more than 2000 features per layer.

March 21, 2025 at 7:39 PM

Chris Wendler

@wendlerc.bsky.social

It should be pretty self explanatory to use this app. You type in the feature index, select the layer, the strength of the coefficient, you brush a mask where the feature should be activated and hit "apply"...

March 21, 2025 at 7:39 PM

Chris Wendler

@wendlerc.bsky.social

You an also do more "abstract" things like brushing the face with a "water"-texture feature...

March 21, 2025 at 7:39 PM

Chris Wendler

@wendlerc.bsky.social

But that's not the best part yet. My favorite layer is the "style" layer. It allows you to draw with textures without modifying the rest of the image much. E.g. this happens when you brush the face with the "giraffe texture feature".

March 21, 2025 at 7:39 PM

Chris Wendler

@wendlerc.bsky.social

Inspired by this I also made one where I tried to take the hole-feature from a "Trypophobia" image...

March 21, 2025 at 7:39 PM

Chris Wendler

@wendlerc.bsky.social

Let's see what happens if we turn on a feature that activates on the beard but in the detail layer... We noticed in our experiments that these features often latch onto the context of the generated image (and require relevant context to be effective). The result is wild!

March 21, 2025 at 7:39 PM

Chris Wendler

@wendlerc.bsky.social

There is also one that seems to have something to do with the beard. Turning it on shows that it probably is more than just a beard... maybe a "manliness" feature or something like that.

March 21, 2025 at 7:39 PM

Chris Wendler

@wendlerc.bsky.social

We can look for interesting features in the "explore" tab. E.g. in the "composition" block feature number 199 seems to have to do with that hat. Let's turn it on...

March 21, 2025 at 7:39 PM

Chris Wendler

@wendlerc.bsky.social

Let's start with the prompt "an image of a colorful model"

March 21, 2025 at 7:39 PM

Chris Wendler

@wendlerc.bsky.social

We built an app where you can explore these features and turn them on while generating an image. The app is here:
huggingface.co/spaces/surok...

Unboxing SDXL with SAEs - a Hugging Face Space by surokpro2

Discover amazing ML apps made by the community

huggingface.co

March 21, 2025 at 7:39 PM

Chris Wendler

@wendlerc.bsky.social

And are more transferrable. I think with the current methods, we get the best interpretations by accumulating many different agreeing angles onto the same question. SAE's can be one of them, salience maps another one, patching another one, and so on. But we still need better tools.

March 10, 2025 at 12:59 PM

Chris Wendler

@wendlerc.bsky.social

I am also skeptical about SAEs and steering. I usually compare SAE/steering interventions to adversarial attacks (AAs). For AAs we know that they can yield arbitrary outputs via minimal perturbations of arbitrary intermediate states. Compared to those steering/SAE IMO use less optimization power.

Unboxing SDXL Turbo with SAEs

Sparse Autoencoders (SAEs) find interpretable features in Stable Diffusion Turbo and enable fine-grained image editing.

sdxl-unbox.epfl.ch

March 10, 2025 at 12:57 PM

Chris Wendler

@wendlerc.bsky.social

What exactly this tells us about the mechanisms is an open question. Compared to editing the input, you can get very different but still interpretable effects, e.g., in our work we basically found features that can be turned into style brushes by SAEing up.0.1 in SDXL Turbo sdxl-unbox.epfl.ch

Unboxing SDXL Turbo with SAEs

Sparse Autoencoders (SAEs) find interpretable features in Stable Diffusion Turbo and enable fine-grained image editing.

sdxl-unbox.epfl.ch

March 10, 2025 at 12:53 PM

Chris Wendler

@wendlerc.bsky.social

I did not read that paper with the random weights, but this is something I imagine impossible to do with random weights and thus probably not properly discussed in that work.

March 10, 2025 at 12:54 AM

Chris Wendler

@wendlerc.bsky.social

This is different angle that makes interpretations of SAE features testable. „Do they affect the remaining forward pass in a way consistent with my interpretation?“

March 10, 2025 at 12:52 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news