Lightnews — Scholar-powered news

Xun Huang

@xunhuang.bsky.social

Sr Research Scientist at Adobe, Adjunct Professor at CMU
Previously at NVIDIA, Ph.D. at Cornell.
Snap & NVIDIA & Adobe Fellowship Recipient.
Views are my own.
xunhuang.me

Posts Replies Media Videos

Xun Huang

@xunhuang.bsky.social

5.3/ Dynamic Prompting: Interactively generate a story containing a sequence of events by modifying text prompts during generation. Here's a real-time demo of our UI.

December 30, 2024 at 9:06 PM

Xun Huang

@xunhuang.bsky.social

5.2/ Streaming Video-to-Video Translation: Here's an example of transforming a Minecraft video into a realistic one (in real time!). In the future of gaming, maybe we could render only basic geometries, with AI video models adding all textures and lighting.

December 30, 2024 at 9:06 PM

Xun Huang

@xunhuang.bsky.social

5/ Applications:
Thanks to causal dependencies, CausVid enables a wide range of additional applications without the need for fine-tuning!
5.1/ Image-to-Video: Treating an input image as the first generated frame, our method can naturally animate static images.

December 30, 2024 at 9:06 PM

Xun Huang

@xunhuang.bsky.social

4/ CausVid can generate infinitely-long videos efficiently, by combining sliding window and KV-cache inference. Here we show that it produces long (30-second) videos with better quality than existing methods, while being orders of magnitude faster.

December 30, 2024 at 9:06 PM

Xun Huang

@xunhuang.bsky.social

3/ Results:
⚡ 170x Faster (3.5 mins -> 1.3 sec latency)
⚡ 16x Higher throughput (0.6 -> 9.4 FPS)
🏅 Great quality (1st spot on VBench)!

December 30, 2024 at 9:06 PM

Xun Huang

@xunhuang.bsky.social

2/ Our Solution:
We distill a multi-step, bidirectional video diffusion model into a few-step, causal model that generates video frames on-the-fly. Think of it like switching from downloading a whole movie to streaming it - you can start watching as soon as the first frame is ready.

December 30, 2024 at 9:06 PM

Xun Huang

@xunhuang.bsky.social

1/ The Problem:
⏳ Current video diffusion models need several minutes to create just a 10-sec clip. Why so slow? A major issue is that these models can't show you anything until they've generated the entire video. Each frame is linked to both past and future frames through bidirectional attention.

December 30, 2024 at 9:06 PM

Xun Huang

@xunhuang.bsky.social

🚀 Introducing CausVid: An autoregressive video generation model that starts playing instantly as you hit 'Generate,' while also securing the 1st spot on VBench!

Project Page: causvid.github.io. More details in the long thread.

December 30, 2024 at 9:06 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news