yannnke.bsky.social
@yannnke.bsky.social
See more work from the @NVIDIA Toronto AI Lab here: research.nvidia.com/labs/toronto...

11/n
NVIDIA Toronto AI Lab
NVIDIA Toronto AI lab
research.nvidia.com
December 4, 2024 at 11:02 PM
In the future, we want real-time, high-quality, inexpensive diffusion models that generate immersive experiences on the fly. Our method is one route to—for example—creating such 2D/3D/4D/audio/video-generative models that could be deployed in virtual reality applications.

10/n
December 4, 2024 at 11:02 PM
Our work is inspired by underlying advances in diffusion distillation methods, including DMD: arxiv.org/abs/2311.18828, DMD2: arxiv.org/abs/2405.14867, CD: arxiv.org/abs/2303.01469, SiD: arxiv.org/abs/2404.04057.

9/n
One-step Diffusion with Distribution Matching Distillation
Diffusion models generate high-quality images but require dozens of forward passes. We introduce Distribution Matching Distillation (DMD), a procedure to transform a diffusion model into a one-step im...
arxiv.org
December 4, 2024 at 11:00 PM
The following future work avenues are exciting:
1) More sophisticated routing mechanisms may further boost performance.
2) Better methods for reducing student size will likely increase quality and latency.
3) Sharing/interaction schemes among students may enhance training efficiency.

8/n
December 4, 2024 at 10:58 PM
We also successfully distill into much smaller students with competitive generation quality.

7/n
December 4, 2024 at 10:58 PM
Multi-Student Distillation (MSD) significantly boosts FID scores on class-conditional ImageNet-64x64 generation and text-to-image zero shot COCO2014 generation, improving on single-student counterparts using only 4 students.

6/n
December 4, 2024 at 10:57 PM
Distilling into smaller – thus faster – students poses challenges such as weight initialization. We resolve this with an additional teacher score matching (TSM) stage. TSM trains multi-step students to emulate the teacher scores, providing useful weight initializations.

5/n
December 4, 2024 at 10:56 PM
Recent works have distilled diffusion models into as few as a single step. We further increase the generation quality and speed.

1) At training time, MSD partitions the dataset, and assigns them to different students;
2) At inference time, MSD uses only one student.

4/n
December 4, 2024 at 10:55 PM
MSD achieves SOTA generation qualities using the same-sized student as teacher, as in other distillation works.

It also allows improved latency with its new ability to train smaller-sized students by focusing their capacity on different subsets of data.

3/n
December 4, 2024 at 10:55 PM
Project page: research.nvidia.com/labs/toronto...
Paper: arxiv.org/abs/2410.23274

w/ @jonLorraine9 , Weili Nie, Karsten Kreis, James Lucas

Supported (indirectly) by Harvard Statistics Department,
@vectorinst.bsky.social
, Department of Computer Science, University of Toronto

2/n
December 4, 2024 at 10:54 PM