Chris Wendler
wendlerc.bsky.social
Chris Wendler
@wendlerc.bsky.social
Postdoc at the interpretable deep learning lab at Northeastern University, deep learning, LLMs, mechanistic interpretability
It should be pretty self explanatory to use this app. You type in the feature index, select the layer, the strength of the coefficient, you brush a mask where the feature should be activated and hit "apply"...
March 21, 2025 at 7:39 PM
You an also do more "abstract" things like brushing the face with a "water"-texture feature...
March 21, 2025 at 7:39 PM
But that's not the best part yet. My favorite layer is the "style" layer. It allows you to draw with textures without modifying the rest of the image much. E.g. this happens when you brush the face with the "giraffe texture feature".
March 21, 2025 at 7:39 PM
Inspired by this I also made one where I tried to take the hole-feature from a "Trypophobia" image...
March 21, 2025 at 7:39 PM
Let's see what happens if we turn on a feature that activates on the beard but in the detail layer... We noticed in our experiments that these features often latch onto the context of the generated image (and require relevant context to be effective). The result is wild!
March 21, 2025 at 7:39 PM
There is also one that seems to have something to do with the beard. Turning it on shows that it probably is more than just a beard... maybe a "manliness" feature or something like that.
March 21, 2025 at 7:39 PM
We can look for interesting features in the "explore" tab. E.g. in the "composition" block feature number 199 seems to have to do with that hat. Let's turn it on...
March 21, 2025 at 7:39 PM
Let's start with the prompt "an image of a colorful model"
March 21, 2025 at 7:39 PM
In case you ever wondered what you could do if you had SAEs for intermediate results of diffusion models, we trained SDXL Turbo SAEs on 4 blocks for you. We noticed that they specialize into a "composition", a "detail", and a "style" block. And one that is hard to make sense of.
March 21, 2025 at 7:39 PM