Lightnews — Scholar-powered news

Thodoris Kouzelis

@nicolabourbaki.bsky.social

10/n We apply PCA to DINOv2 to retain expressivity without dominating model capacity. Just a few PCs suffice to significantly boost generative performance.

April 25, 2025 at 7:23 AM

Thodoris Kouzelis

@nicolabourbaki.bsky.social

9/n Unconditional generation gets a huge upgrade too. ReDi + Representation Guidance (RG) nearly closes the gap with conditional models. E.g., unconditional DiT-XL/2 with ReDi+RG hits FID 22.6, close to class-conditioned DiT-XL’s FID 19.5! 💪

April 25, 2025 at 7:23 AM

Thodoris Kouzelis

@nicolabourbaki.bsky.social

8/n ReDi delivers delivers state-of-the-art results with exceptional generation performance, across the board. 🔥

April 25, 2025 at 7:23 AM

Thodoris Kouzelis

@nicolabourbaki.bsky.social

7/n Training speed? Massive improvements for both DiT and SiT:
~23x faster convergence than baseline DiT/SiT.
~6x faster than REPA.🚀

April 25, 2025 at 7:23 AM

Thodoris Kouzelis

@nicolabourbaki.bsky.social

5/n We explore two ways to fuse tokens for image latents & features
- Merged Tokens (MR): Efficient, keeps token count constant
- Separate Tokens (SP): More expressive, ~2x compute
Both boost performance, but MR hits the sweet spot for speed vs. quality.

April 25, 2025 at 7:23 AM

Thodoris Kouzelis

@nicolabourbaki.bsky.social

4/n Integrating ReDi into DiT/SiT-style architectures is seamless
- Apply noise to both image latents and semantic features
- Fuse them into one token sequence
- Denoise both with standard DiT/SiT
That’s it.

April 25, 2025 at 7:23 AM

Thodoris Kouzelis

@nicolabourbaki.bsky.social

3/n ReDi builds on the insight that some latent representations are inherently easier to model (h/t @sedielem.bsky.social's blog), enabling a unified dual-space diffusion approach that generates coherent image–feature pairs from pure noise.

April 25, 2025 at 7:23 AM

Thodoris Kouzelis

@nicolabourbaki.bsky.social

1/n Introducing ReDi (Representation Diffusion): a new generative approach that leverages a diffusion model to jointly capture
– Low-level image details (via VAE latents)
– High-level semantic features (via DINOv2)🧵

April 25, 2025 at 7:23 AM

Thodoris Kouzelis

@nicolabourbaki.bsky.social

9/n How fast does EQ-VAE refine the latents?
We trained DiT-B/2 on the resulting latents at each fine-tuning epoch. Even after just a few epochs, gFID drops significantly—showing how quickly EQ-VAE improves the latent space.

February 18, 2025 at 2:27 PM

Thodoris Kouzelis

@nicolabourbaki.bsky.social

8/n Why does EQ-VAE help so much?
We find a strong correlation between latent space complexity and generative performance.
🔹 EQ-VAE reduces the intrinsic dimension (ID) of the latent manifold.
🔹 This makes the latent space simpler and easier to model.

February 18, 2025 at 2:27 PM

Thodoris Kouzelis

@nicolabourbaki.bsky.social

6/n EQ-VAE provides a plug-and-play enhancement — no architectural changes are needed, working seamlessly with:
✅ Continuous autoencoders (SD-VAE, SDXL-VAE, SD3-VAE)
✅ Discrete autoencoders (VQ-GAN)

February 18, 2025 at 2:27 PM

Thodoris Kouzelis

@nicolabourbaki.bsky.social

5/n EQ-VAE fixes this by introducing a simple regularization objective:
👉 It aligns reconstructions of transformed latents with the corresponding transformed inputs.

February 18, 2025 at 2:27 PM

Thodoris Kouzelis

@nicolabourbaki.bsky.social

4/n The motivation:
SOTA autoencoders reconstruct images well but fail to maintain equivariance in latent space.
✅ If you scale an input image, its reconstruction is fine
❌ But if you scale the latent representation directly, the reconstruction degrades significantly.

February 18, 2025 at 2:27 PM

Thodoris Kouzelis

@nicolabourbaki.bsky.social

1/n🚀If you’re working on generative image modeling, check out our latest work! We introduce EQ-VAE, a simple yet powerful regularization approach that makes latent representations equivariant to spatial transformations, leading to smoother latents and better generative models.👇

February 18, 2025 at 2:27 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news