ExplainableML
banner
eml-munich.bsky.social
ExplainableML
@eml-munich.bsky.social
Institute for Explainable Machine Learning at @www.helmholtz-munich.de and Interpretable and Reliable Machine Learning group at Technical University of Munich and part of @munichcenterml.bsky.social
Results. GenEval: SDXL 0.55→0.61 (notable gains in two objects, counting, color attribution). T2I-CompBench: broad boosts (esp. Color/Texture). DPG-Bench (SDXL): DSG 74.65→79.26, Q-Align 0.72→0.81; user study: RankDPO wins over SDXL & DPO-SDXL.
October 20, 2025 at 12:35 PM
We propose RankDPO—a listwise preference objective that weights pairwise denoising comparisons using DCG-style gains/discounts, optimizing the entire ranking per prompt rather than isolated pairs.
October 20, 2025 at 12:35 PM
Direct Preference Optimization is strong for T2I—but human labels are pricey/outdated. We build Syn-Pic: a fully synthetic ranked preference dataset by ensembling 5 reward models to remove humans from the loop.
October 20, 2025 at 12:35 PM
2/
Scalable Ranked Preference Optimization for Text-to-Image Generation
@shyamgopal.bsky.social , Huseyin Coskun, @zeynepakata.bsky.social , Sergey Tulyakov, Jian Ren, Anil Kag
[Paper]: arxiv.org/pdf/2410.18013
📍Hall I #1702
🕑Oct 22, Poster Session 4
October 20, 2025 at 12:35 PM
SUB enables rigorous stress-testing of interpretable models. We find that CBMs fail to generalize to these novel combinations of known concepts.
October 20, 2025 at 12:35 PM
To generate precise variations, we propose Tied Diffusion Guidance (TDG) — sharing noise across two parallel denoising processes to ensure correct class and attribute generation.
October 20, 2025 at 12:35 PM
We introduce SUB, a fine-grained image & concept benchmark with 38,400 synthetic bird images 🦤.
Using 33 classes & 45 concepts (e.g., wing color, belly pattern), SUB tests how robust CBMs are to targeted concept variations.
October 20, 2025 at 12:35 PM
1/
SUB: Benchmarking CBM Generalization via Synthetic Attribute Substitutions
@jessica-bader.bsky.social , @lgirrbach.bsky.social , Stephan Alaniz, @zeynepakata.bsky.social
[Paper]: arxiv.org/pdf/2507.23784
[Code]: github.com/ExplainableM...
📍Hall I #2142
🕑Oct 23, Poster Session 5
October 20, 2025 at 12:35 PM
Are you at #ICCV2025? 🎉 @iccv.bsky.social
EML Munich is presenting two poster papers—come say hi to our authors!
Details in the thread 👇
October 20, 2025 at 12:35 PM
3/
Noise Hypernetworks: Amortizing Test-Time Compute in Diffusion Models
@lucaeyring.bsky.social , @shyamgopal.bsky.social , Alexey Dosovitskiy, @natanielruiz.bsky.social , @zeynepakata.bsky.social
[Paper]: arxiv.org/abs/2508.09968
[Code]: github.com/ExplainableM...
October 13, 2025 at 2:44 PM
🔥 These methods uncover internal “chunks” that correspond to meaningful concepts in both RNNs and LLMs. Activating a chunk causally steers the model toward that concept — so these chunks aren’t just correlations but functional building blocks.
October 13, 2025 at 2:44 PM
🤔Neural networks are often labeled as 'black boxes', yet we found that population activity mirrors patterns in training data (the “Reflection Hypothesis”). Can we chunk those patterns into human-legible concepts?
October 13, 2025 at 2:44 PM
✨ Applying SAE interventions to CLIP’s vision encoder directly steers multimodal LLM outputs (e.g., LLaVA) without modifying the base model. These results highlight SAEs as a practical, unsupervised approach to improving both interpretability and control in VLMs.
October 13, 2025 at 2:44 PM
💡 Our experiments show that SAEs trained on VLMs substantially enhance neuron-level monosemanticity. Sparsity promotes disentanglement, while wider latent spaces yield richer, more human-interpretable concept representations.
October 13, 2025 at 2:44 PM
💬 To ensure our metric reflects meaningful structure, we validate it through a large-scale user study, showing strong alignment between our quantitative measure and human perception of monosemanticity.
October 13, 2025 at 2:44 PM
1/
Sparse Autoencoders Learn Monosemantic Features in Vision-Language Models
Mateusz Pach, @shyamgopal.bsky.social , @qbouniot.bsky.social , Serge Belongie , @zeynepakata.bsky.social
[Paper]: arxiv.org/pdf/2504.02821
[Code]: github.com/ExplainableM...
October 13, 2025 at 2:44 PM
🔥We celebrate 3 papers accepted to NeurIPS 2025, see you in San Diego! 🥳Topics include diffusion models, sparse autoencoders (SAEs) and neural chunking. See the thread for highlights👇
October 13, 2025 at 2:44 PM
🎓PhD Spotlight: Karsten Roth

Celebrate @confusezius.bsky.social , who defended his PhD on June 24th summa cum laude!

🏁 His next stop: Google DeepMind in Zurich!

Join us in celebrating Karsten's achievements and wishing him the best for his future endeavors! 🥳
August 4, 2025 at 2:11 PM
🎓PhD Spotlight: Shyamgopal Karthik

Celebrate @shyamgopal.bsky.social , who will defend his PhD on 23rd June! Shyam has been a PhD student @unituebingen.bsky.social since October 2021, supervised by @zeynepakata.bsky.social.
June 18, 2025 at 9:53 AM
(4/4) FLAIR: VLM with Fine-grained Language-informed Image Representations
@rui-xiao.bsky.social will present his work on pretraining a CLIP-like model that generates fine-grained image representations.
📍 ExHall D Poster #368
⏲️ Sun 15 Jun 10:30 a.m. CDT — 12:30 p.m. CDT
June 11, 2025 at 1:13 PM
(3/4) How to Merge Your Multimodal Models Over Time?
@confusezius.bsky.social will also present this amazing work that introduces a unified framework for temporal model merging.
📍 ExHall D Poster #445
⏲️ Sat 14 Jun 5 p.m. CDT — 7 p.m. CDT
June 11, 2025 at 1:13 PM
(2/4) COSMOS: Cross-Modality Self-Distillation for Vision Language Pre-training
Sanghwan Kim will present COSMOS, integrating a novel text-cropping strategy and cross-attention module into a self-supervised learning framework.
📍 ExHall D Poster #387
⏲️ Sat 14 Jun 10:30 a.m. CDT — 12:30 p.m. CDT
June 11, 2025 at 1:13 PM
(1/4) Context-Aware Multimodal Pretraining (Highlight)
@confusezius.bsky.social will share his amazing work on how to turn vision-language model from a great zero-shot model for strong few-shot adaptation.
📍 ExHall D Poster #391
⏲️ Fri 13 Jun 10:30 a.m. CDT — 12:30 p.m. CDT
June 11, 2025 at 1:13 PM
📢 Landed in Nashville🎺 for #CVPR2025! The EML group is presenting 4 exciting papers — come say hi at our poster sessions! More details in the thread — see you there! 🏁🌟
June 11, 2025 at 1:13 PM
🎓PhD Spotlight: Leonard Salewski

Celebrate @l-salewski.bsky.social, who will defend his PhD on 24th June! 📸🎉
May 26, 2025 at 12:09 PM