Lightnews — Scholar-powered news

Example: There are no “teapots on mountains” in ImageNet.

We verify this via nearest-neighbor search in DinoV2 space.
But our model can still create them—by composing concepts it learned separately.

July 22, 2025 at 2:41 PM

Samuel Lavoie

@lavoiems.bsky.social

LLMs can speak in DLC!

We fine-tune a language model to sample DLC tokens from text, giving us a pipeline:
Text → DLC → Image
This also enables generation beyond ImageNet.

July 22, 2025 at 2:41 PM

Samuel Lavoie

@lavoiems.bsky.social

DLCs are compositional.
Swap tokens between two images (🐕 Komodor + 🍝 Carbonara) → the model produces coherent hybrids never seen during training.

July 22, 2025 at 2:41 PM

Samuel Lavoie

@lavoiems.bsky.social

🚀 Results:

DiT-XL/2 + DLC → FID 1.59 on unconditional ImageNet

Works well with and without classifier-free guidance

Learns faster and better than prior works using pre-trained encoders

🤯

July 22, 2025 at 2:41 PM

Samuel Lavoie

@lavoiems.bsky.social

Unconditional generation pipeline:
Sample a DLC (e.g., with SEDD)

Decode it into an image (e.g., with DiT)

This ancestral sampling approach is simple but powerful.

July 22, 2025 at 2:41 PM

Samuel Lavoie

@lavoiems.bsky.social

DLCs enables exactly this.
Images → sequences of discrete tokens via a Simplicial Embedding (SEM) encoder

We take the argmax over token distributions → get the DLC sequence

Think of it as “tokenizing” images—like words for LLMs.

July 22, 2025 at 2:41 PM

Samuel Lavoie

@lavoiems.bsky.social

Text models don’t have this problem! LLMs can model internet scale corpus.

So… can we improve image generation of highly-modal distributions by decomposing it into:

1. Generating discrete tokens - p(c)
2. Decoding tokens into images - p(x|c)

July 22, 2025 at 2:41 PM

Samuel Lavoie

@lavoiems.bsky.social

Modeling highly multimodal distributions in continuous space is hard.
Even a simple 2D Gaussian mixture with a large number of modes may be tricky to model directly. Good conditioning solves this!

Could this be why large image generative models are almost always conditional? 🤔

July 22, 2025 at 2:41 PM

Samuel Lavoie

@lavoiems.bsky.social

Congrats Lucas! Looking forward to see what will come out of your lab in Zurich!

December 5, 2024 at 12:55 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news