Previously at NVIDIA, Ph.D. at Cornell.
Snap & NVIDIA & Adobe Fellowship Recipient.
Views are my own.
xunhuang.me
Thanks to causal dependencies, CausVid enables a wide range of additional applications without the need for fine-tuning!
5.1/ Image-to-Video: Treating an input image as the first generated frame, our method can naturally animate static images.
Thanks to causal dependencies, CausVid enables a wide range of additional applications without the need for fine-tuning!
5.1/ Image-to-Video: Treating an input image as the first generated frame, our method can naturally animate static images.
⚡ 170x Faster (3.5 mins -> 1.3 sec latency)
⚡ 16x Higher throughput (0.6 -> 9.4 FPS)
🏅 Great quality (1st spot on VBench)!
⚡ 170x Faster (3.5 mins -> 1.3 sec latency)
⚡ 16x Higher throughput (0.6 -> 9.4 FPS)
🏅 Great quality (1st spot on VBench)!
We distill a multi-step, bidirectional video diffusion model into a few-step, causal model that generates video frames on-the-fly. Think of it like switching from downloading a whole movie to streaming it - you can start watching as soon as the first frame is ready.
We distill a multi-step, bidirectional video diffusion model into a few-step, causal model that generates video frames on-the-fly. Think of it like switching from downloading a whole movie to streaming it - you can start watching as soon as the first frame is ready.
⏳ Current video diffusion models need several minutes to create just a 10-sec clip. Why so slow? A major issue is that these models can't show you anything until they've generated the entire video. Each frame is linked to both past and future frames through bidirectional attention.
⏳ Current video diffusion models need several minutes to create just a 10-sec clip. Why so slow? A major issue is that these models can't show you anything until they've generated the entire video. Each frame is linked to both past and future frames through bidirectional attention.