ExplainableML
banner
eml-munich.bsky.social
ExplainableML
@eml-munich.bsky.social
Institute for Explainable Machine Learning at @www.helmholtz-munich.de and Interpretable and Reliable Machine Learning group at Technical University of Munich and part of @munichcenterml.bsky.social
Results. GenEval: SDXL 0.55→0.61 (notable gains in two objects, counting, color attribution). T2I-CompBench: broad boosts (esp. Color/Texture). DPG-Bench (SDXL): DSG 74.65→79.26, Q-Align 0.72→0.81; user study: RankDPO wins over SDXL & DPO-SDXL.
October 20, 2025 at 12:35 PM
We propose RankDPO—a listwise preference objective that weights pairwise denoising comparisons using DCG-style gains/discounts, optimizing the entire ranking per prompt rather than isolated pairs.
October 20, 2025 at 12:35 PM
Direct Preference Optimization is strong for T2I—but human labels are pricey/outdated. We build Syn-Pic: a fully synthetic ranked preference dataset by ensembling 5 reward models to remove humans from the loop.
October 20, 2025 at 12:35 PM
2/
Scalable Ranked Preference Optimization for Text-to-Image Generation
@shyamgopal.bsky.social , Huseyin Coskun, @zeynepakata.bsky.social , Sergey Tulyakov, Jian Ren, Anil Kag
[Paper]: arxiv.org/pdf/2410.18013
📍Hall I #1702
🕑Oct 22, Poster Session 4
October 20, 2025 at 12:35 PM
SUB enables rigorous stress-testing of interpretable models. We find that CBMs fail to generalize to these novel combinations of known concepts.
October 20, 2025 at 12:35 PM
To generate precise variations, we propose Tied Diffusion Guidance (TDG) — sharing noise across two parallel denoising processes to ensure correct class and attribute generation.
October 20, 2025 at 12:35 PM
We introduce SUB, a fine-grained image & concept benchmark with 38,400 synthetic bird images 🦤.
Using 33 classes & 45 concepts (e.g., wing color, belly pattern), SUB tests how robust CBMs are to targeted concept variations.
October 20, 2025 at 12:35 PM
Concept Bottleneck Models (CBMs) hold huge promise for making AI more transparent—especially in high-stakes fields like medicine. But how well do they hold up under distribution shifts? 🧠
October 20, 2025 at 12:35 PM
1/
SUB: Benchmarking CBM Generalization via Synthetic Attribute Substitutions
@jessica-bader.bsky.social , @lgirrbach.bsky.social , Stephan Alaniz, @zeynepakata.bsky.social
[Paper]: arxiv.org/pdf/2507.23784
[Code]: github.com/ExplainableM...
📍Hall I #2142
🕑Oct 23, Poster Session 5
October 20, 2025 at 12:35 PM
💡We replace reward guided test-time noise optimization in diffusion models with a Noise Hypernetwork that modulates initial input noise.
Please visit @lucaeyring.bsky.social 's thread for a wonderful illustration of the method👇
x.com/LucaEyring/s...
Luca Eyring on X: "Reward hacking is challenging when fine-tuning few-step Diffusion models. Direct fine-tuning on rewards can create artifacts that game metrics while degrading visual quality. We propose Noise Hypernetworks as a theoretically grounded solution, inspired by test-time optimization. https://t.co/KqLlwbZpG8" / X
Reward hacking is challenging when fine-tuning few-step Diffusion models. Direct fine-tuning on rewards can create artifacts that game metrics while degrading visual quality. We propose Noise Hypernetworks as a theoretically grounded solution, inspired by test-time optimization. https://t.co/KqLlwbZpG8
x.com
October 13, 2025 at 2:44 PM
3/
Noise Hypernetworks: Amortizing Test-Time Compute in Diffusion Models
@lucaeyring.bsky.social , @shyamgopal.bsky.social , Alexey Dosovitskiy, @natanielruiz.bsky.social , @zeynepakata.bsky.social
[Paper]: arxiv.org/abs/2508.09968
[Code]: github.com/ExplainableM...
October 13, 2025 at 2:44 PM
🔥 These methods uncover internal “chunks” that correspond to meaningful concepts in both RNNs and LLMs. Activating a chunk causally steers the model toward that concept — so these chunks aren’t just correlations but functional building blocks.
October 13, 2025 at 2:44 PM
💡We propose three chunking techniques: Discrete Sequence Chunking (making a dictionary of recurring patterns), Population Averaging (aligning neural states with known labels), and Unsupervised Chunk Discovery (finding chunks without labels).
October 13, 2025 at 2:44 PM
🤔Neural networks are often labeled as 'black boxes', yet we found that population activity mirrors patterns in training data (the “Reflection Hypothesis”). Can we chunk those patterns into human-legible concepts?
October 13, 2025 at 2:44 PM
2/
Concept-Guided Interpretability via Neural Chunking
Shuchen Wu , Stephan Alaniz, @shyamgopal.bsky.social , Peter Dayan, @ericschulz.bsky.social , @zeynepakata.bsky.social
[Paper]: arxiv.org/pdf/2505.11576
[Code]: github.com/swu32/Chunk-...
arxiv.org
October 13, 2025 at 2:44 PM
✨ Applying SAE interventions to CLIP’s vision encoder directly steers multimodal LLM outputs (e.g., LLaVA) without modifying the base model. These results highlight SAEs as a practical, unsupervised approach to improving both interpretability and control in VLMs.
October 13, 2025 at 2:44 PM
💡 Our experiments show that SAEs trained on VLMs substantially enhance neuron-level monosemanticity. Sparsity promotes disentanglement, while wider latent spaces yield richer, more human-interpretable concept representations.
October 13, 2025 at 2:44 PM
💬 To ensure our metric reflects meaningful structure, we validate it through a large-scale user study, showing strong alignment between our quantitative measure and human perception of monosemanticity.
October 13, 2025 at 2:44 PM
📈 We introduce a framework to quantitatively evaluate monosemanticity at the neuron level in VLMs, before and after applying SAEs. This enables systematic measurement of how interpretability evolves as representations become more sparse and structured.
October 13, 2025 at 2:44 PM
🧩 Sparse Autoencoders (SAEs) are powerful tools for interpreting LLMs by disentangling neurons into monosemantic units, yet their effectiveness in Vision-Language Models (VLMs) remains underexplored.
October 13, 2025 at 2:44 PM
1/
Sparse Autoencoders Learn Monosemantic Features in Vision-Language Models
Mateusz Pach, @shyamgopal.bsky.social , @qbouniot.bsky.social , Serge Belongie , @zeynepakata.bsky.social
[Paper]: arxiv.org/pdf/2504.02821
[Code]: github.com/ExplainableM...
October 13, 2025 at 2:44 PM
6/
Disentanglement of Correlated Factors via Hausdorff Factorized Support (ICLR 2023)
@confusezius.bsky.social , Mark Ibrahim, @zeynepakata.bsky.social , Pascal Vincent, Diane Bouchacourt
[Paper]: arxiv.org/abs/2210.07347
[Code]: github.com/facebookrese...
Disentanglement of Correlated Factors via Hausdorff Factorized Support
A grand goal in deep learning research is to learn representations capable of generalizing across distribution shifts. Disentanglement is one promising direction aimed at aligning a model's representa...
arxiv.org
August 4, 2025 at 2:11 PM