Working on conditional diffusion models
Do you think there would be people in for that? Do you think it would make for a nice competition?
Do you think there would be people in for that? Do you think it would make for a nice competition?
- Access the models on Hugging Face: huggingface.co/Lucasdegeorge/CAD-I
- Train your own text-to-image models using our setup: github.com/lucasdegeorge/T2I-ImageNet
- Check out the project page: lucasdegeorge.github.io/projects/t2i...
- Access the models on Hugging Face: huggingface.co/Lucasdegeorge/CAD-I
- Train your own text-to-image models using our setup: github.com/lucasdegeorge/T2I-ImageNet
- Check out the project page: lucasdegeorge.github.io/projects/t2i...
- Achieved +2 overall score over SD-XL on GenEval
- Achieved +5 performance on DPGBench 🏆
- Used only 1/10th of the model parameters
- Trained on 1/1000th of the typical training images
- Achieved +2 overall score over SD-XL on GenEval
- Achieved +5 performance on DPGBench 🏆
- Used only 1/10th of the model parameters
- Trained on 1/1000th of the typical training images
- Detailed Recaptioning: Transforming limited captions into rich, context-aware captions that capture styles, backgrounds, and actions
- Composition: Using CutMix to create diverse concept combinations, expanding the dataset's learning potential.
- Detailed Recaptioning: Transforming limited captions into rich, context-aware captions that capture styles, backgrounds, and actions
- Composition: Using CutMix to create diverse concept combinations, expanding the dataset's learning potential.
But do we really need billions of images?
No, if we are careful enough!
We trained text-to-image models using 1000 times less data in just 200 GPU hours, achieving good-quality images and strong performance on benchmarks
But do we really need billions of images?
No, if we are careful enough!
We trained text-to-image models using 1000 times less data in just 200 GPU hours, achieving good-quality images and strong performance on benchmarks