Xun Huang
xunhuang.bsky.social
Xun Huang
@xunhuang.bsky.social
Sr Research Scientist at Adobe, Adjunct Professor at CMU
Previously at NVIDIA, Ph.D. at Cornell.
Snap & NVIDIA & Adobe Fellowship Recipient.
Views are my own.
xunhuang.me
5.3/ Dynamic Prompting: Interactively generate a story containing a sequence of events by modifying text prompts during generation. Here's a real-time demo of our UI.
December 30, 2024 at 9:06 PM
5.2/ Streaming Video-to-Video Translation: Here's an example of transforming a Minecraft video into a realistic one (in real time!). In the future of gaming, maybe we could render only basic geometries, with AI video models adding all textures and lighting.
December 30, 2024 at 9:06 PM
5/ Applications:
Thanks to causal dependencies, CausVid enables a wide range of additional applications without the need for fine-tuning!
5.1/ Image-to-Video: Treating an input image as the first generated frame, our method can naturally animate static images.
December 30, 2024 at 9:06 PM
4/ CausVid can generate infinitely-long videos efficiently, by combining sliding window and KV-cache inference. Here we show that it produces long (30-second) videos with better quality than existing methods, while being orders of magnitude faster.
December 30, 2024 at 9:06 PM
3/ Results:
⚡ 170x Faster (3.5 mins -> 1.3 sec latency)
⚡ 16x Higher throughput (0.6 -> 9.4 FPS)
🏅 Great quality (1st spot on VBench)!
December 30, 2024 at 9:06 PM
2/ Our Solution:
We distill a multi-step, bidirectional video diffusion model into a few-step, causal model that generates video frames on-the-fly. Think of it like switching from downloading a whole movie to streaming it - you can start watching as soon as the first frame is ready.
December 30, 2024 at 9:06 PM
1/ The Problem:
⏳ Current video diffusion models need several minutes to create just a 10-sec clip. Why so slow? A major issue is that these models can't show you anything until they've generated the entire video. Each frame is linked to both past and future frames through bidirectional attention.
December 30, 2024 at 9:06 PM
🚀 Introducing CausVid: An autoregressive video generation model that starts playing instantly as you hit 'Generate,' while also securing the 1st spot on VBench!

Project Page: causvid.github.io. More details in the long thread.
December 30, 2024 at 9:06 PM