Lightnews — Scholar-powered news

Thodoris Kouzelis

@nicolabourbaki.bsky.social

11/n Joint work with @sta8is.bsky.social @ikakogeorgiou.bsky.social , @spyrosgidaris.bsky.social , Nikos Komodakis
Paper: arxiv.org/abs/2504.16064
Code: github.com/zelaki/ReDi

Boosting Generative Image Modeling via Joint Image-Feature Synthesis

Latent diffusion models (LDMs) dominate high-quality image generation, yet integrating representation learning with generative modeling remains a challenge. We introduce a novel generative image model...

arxiv.org

April 25, 2025 at 7:23 AM

Thodoris Kouzelis

@nicolabourbaki.bsky.social

10/n We apply PCA to DINOv2 to retain expressivity without dominating model capacity. Just a few PCs suffice to significantly boost generative performance.

April 25, 2025 at 7:23 AM

Thodoris Kouzelis

@nicolabourbaki.bsky.social

9/n Unconditional generation gets a huge upgrade too. ReDi + Representation Guidance (RG) nearly closes the gap with conditional models. E.g., unconditional DiT-XL/2 with ReDi+RG hits FID 22.6, close to class-conditioned DiT-XL’s FID 19.5! 💪

April 25, 2025 at 7:23 AM

Thodoris Kouzelis

@nicolabourbaki.bsky.social

8/n ReDi delivers delivers state-of-the-art results with exceptional generation performance, across the board. 🔥

April 25, 2025 at 7:23 AM

Thodoris Kouzelis

@nicolabourbaki.bsky.social

7/n Training speed? Massive improvements for both DiT and SiT:
~23x faster convergence than baseline DiT/SiT.
~6x faster than REPA.🚀

April 25, 2025 at 7:23 AM

Thodoris Kouzelis

@nicolabourbaki.bsky.social

6/n ReDi requires no extra distillation losses, just pure diffusion, significantly simplifying training. Plus, it unlocks Representation Guidance (RG), a new inference strategy that uses learned semantics to steer and refine image generation. 🎯

April 25, 2025 at 7:23 AM

Thodoris Kouzelis

@nicolabourbaki.bsky.social

5/n We explore two ways to fuse tokens for image latents & features
- Merged Tokens (MR): Efficient, keeps token count constant
- Separate Tokens (SP): More expressive, ~2x compute
Both boost performance, but MR hits the sweet spot for speed vs. quality.

April 25, 2025 at 7:23 AM

Thodoris Kouzelis

@nicolabourbaki.bsky.social

4/n Integrating ReDi into DiT/SiT-style architectures is seamless
- Apply noise to both image latents and semantic features
- Fuse them into one token sequence
- Denoise both with standard DiT/SiT
That’s it.

April 25, 2025 at 7:23 AM

Thodoris Kouzelis

@nicolabourbaki.bsky.social

3/n ReDi builds on the insight that some latent representations are inherently easier to model (h/t @sedielem.bsky.social's blog), enabling a unified dual-space diffusion approach that generates coherent image–feature pairs from pure noise.

April 25, 2025 at 7:23 AM

Thodoris Kouzelis

@nicolabourbaki.bsky.social

2/n The result?
🔗 A powerful new method for generative image modeling that bridges generation and representation learning.
⚡️Brings massive gains in performance/training efficiency and a new paradigm for representation-aware generative modeling.

April 25, 2025 at 7:23 AM

Thodoris Kouzelis

@nicolabourbaki.bsky.social

10/n
Joint work with @ikakogeorgiou.bsky.social, @spyrosgidaris.bsky.social and Nikos Komodakis
Paper: arxiv.org/abs/2502.09509
Code: github.com/zelaki/eqvae
HuggingFace Model: huggingface.co/zelaki/eq-va...

EQ-VAE: Equivariance Regularized Latent Space for Improved Generative Image Modeling

Latent generative models have emerged as a leading approach for high-quality image synthesis. These models rely on an autoencoder to compress images into a latent space, followed by a generative model...

arxiv.org

February 18, 2025 at 2:31 PM

Thodoris Kouzelis

@nicolabourbaki.bsky.social

9/n How fast does EQ-VAE refine the latents?
We trained DiT-B/2 on the resulting latents at each fine-tuning epoch. Even after just a few epochs, gFID drops significantly—showing how quickly EQ-VAE improves the latent space.

February 18, 2025 at 2:27 PM

Thodoris Kouzelis

@nicolabourbaki.bsky.social

8/n Why does EQ-VAE help so much?
We find a strong correlation between latent space complexity and generative performance.
🔹 EQ-VAE reduces the intrinsic dimension (ID) of the latent manifold.
🔹 This makes the latent space simpler and easier to model.

February 18, 2025 at 2:27 PM

Thodoris Kouzelis

@nicolabourbaki.bsky.social

7/n Performance gains across the board:
✅ DiT-XL/2: gFID drops from 19.5 → 14.5 at 400K iterations
✅ REPA: Training time 4M → 1M iterations (4× speedup)
✅ MaskGIT: Training time 300 → 130 epochs (2× speedup)

February 18, 2025 at 2:27 PM

Thodoris Kouzelis

@nicolabourbaki.bsky.social

6/n EQ-VAE provides a plug-and-play enhancement — no architectural changes are needed, working seamlessly with:
✅ Continuous autoencoders (SD-VAE, SDXL-VAE, SD3-VAE)
✅ Discrete autoencoders (VQ-GAN)

February 18, 2025 at 2:27 PM

Thodoris Kouzelis

@nicolabourbaki.bsky.social

5/n EQ-VAE fixes this by introducing a simple regularization objective:
👉 It aligns reconstructions of transformed latents with the corresponding transformed inputs.

February 18, 2025 at 2:27 PM

Thodoris Kouzelis

@nicolabourbaki.bsky.social

4/n The motivation:
SOTA autoencoders reconstruct images well but fail to maintain equivariance in latent space.
✅ If you scale an input image, its reconstruction is fine
❌ But if you scale the latent representation directly, the reconstruction degrades significantly.

February 18, 2025 at 2:27 PM

Thodoris Kouzelis

@nicolabourbaki.bsky.social

3/n Fine-tuning pre-trained autoencoders with EQ-VAE for just 5 epochs unlocks major speedups:
✅ 7× faster training convergence on DiT-XL/2
✅ 4× faster training on REPA

February 18, 2025 at 2:27 PM

Thodoris Kouzelis

@nicolabourbaki.bsky.social

2/n Why EQ-VAE?
🔹Smoother latent space = easier to model & better generative performance.
🔹No trade-off in reconstruction quality—rFID improves too!
🔹Works as a plug-and-play enhancement—no architectural changes needed!

February 18, 2025 at 2:27 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news