Vincent Sitzmann
vincentsitzmann.bsky.social
Vincent Sitzmann
@vincentsitzmann.bsky.social
Professor at MIT CSAIL, leading the scene representation group (scenerepresentations.com). We are teaching AI to understand the world through perceiving and interacting with it.
We show that DFoT alone is already a competitive model, matching or beating industry SOTA with way more compute than us. Together with HG, it can stably rollout very long videos, stay robust to out-of-distribution context, and stitch sub-trajectories (6/7)
February 11, 2025 at 8:37 PM
DFoT enables History Guidance (HG), a family of history-conditioned guidance methods that composes diffusion scores from different histories. From its simplest form to its most advanced variant, HG significantly enhances video diffusion and unlocks new abilities. (5/7)
February 11, 2025 at 8:37 PM
Classifier-free Guidance (CFG) has been widely used by video diffusion models to boost sample quality. However, researchers rarely perform CFG beyond the first frame. Our paper finds that an equally important conditioning variable, the history, is the long-ignored key. (2/7)
February 11, 2025 at 8:37 PM
Announcing Diffusion Forcing Transformer (DFoT), our new video diffusion algorithm that generates ultra-long videos of 800+ frames. DFoT enables History Guidance, a simple add-on to any existing video diffusion models for a quality boost. Website: boyuan.space/history-guidance (1/7)
February 11, 2025 at 8:37 PM