Lightnews — Scholar-powered news

Lucas Degeorge

@lucasdegeorge.bsky.social

43 followers 70 following 7 posts

PhD student at École Polytechnique (Vista) and École des Ponts (IMAGINE)
Working on conditional diffusion models

Posts Replies Media Videos

Reposted by Lucas Degeorge

David Picard

@davidpicard.bsky.social

Final note: I'm (we're) tempted to organize a challenge on that topic as a workshop at a CV conf. ImageNet is the only source of images allowed and then you compete to get the bold numbers.

Do you think there would be people in for that? Do you think it would make for a nice competition?

October 8, 2025 at 8:43 PM

Lucas Degeorge

@lucasdegeorge.bsky.social

With @arrijitghosh.bsky.social @nicolasdufour.bsky.social @davidpicard.bsky.social and @vickykalogeiton.bsky.social

March 5, 2025 at 11:52 AM

Lucas Degeorge

@lucasdegeorge.bsky.social

🛠️ Try it yourself:

- Access the models on Hugging Face: huggingface.co/Lucasdegeorge/CAD-I
- Train your own text-to-image models using our setup: github.com/lucasdegeorge/T2I-ImageNet
- Check out the project page: lucasdegeorge.github.io/projects/t2i...

Lucasdegeorge/CAD-I · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

March 5, 2025 at 11:52 AM

Lucas Degeorge

@lucasdegeorge.bsky.social

🔍 Key findings:

- Achieved +2 overall score over SD-XL on GenEval

- Achieved +5 performance on DPGBench 🏆

- Used only 1/10th of the model parameters

- Trained on 1/1000th of the typical training images

March 5, 2025 at 11:52 AM

Lucas Degeorge

@lucasdegeorge.bsky.social

We used ImageNet with smart data-augmentations:

- Detailed Recaptioning: Transforming limited captions into rich, context-aware captions that capture styles, backgrounds, and actions

- Composition: Using CutMix to create diverse concept combinations, expanding the dataset's learning potential.

March 5, 2025 at 11:52 AM

Lucas Degeorge

@lucasdegeorge.bsky.social

Text-to-image models are trained with the "bigger is better" paradigm.

But do we really need billions of images?

No, if we are careful enough!

We trained text-to-image models using 1000 times less data in just 200 GPU hours, achieving good-quality images and strong performance on benchmarks

March 5, 2025 at 11:52 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news