Lightnews — Scholar-powered news

Jiatao Gu

@jgu32.bsky.social

340 followers 360 following 10 posts

Machine Learning Researcher @Apple MLR
Incoming Assistant Professor @Penn CIS

See more details https://jiataogu.me

Posts Replies Media Videos

Jiatao Gu

@jgu32.bsky.social

WVD also supports controllable video generation. Given a single image, we estimate the 3D geometry via standard WVD inference, and project it to get partial XYZ images. Finally, WVD generates the RGB images jointly with the projected XYZ images through in-painting. (6/n)

December 4, 2024 at 1:41 PM

Jiatao Gu

@jgu32.bsky.social

For example, WVD can be directly applied to various single-image tasks. WVD can also take unposed images (video) as input, and infer XYZ images via “in-painting” strategy. With a post optimization procedure, the XYZ images can be converted to camera poses, and depth maps. (5/n)

December 4, 2024 at 1:41 PM

Jiatao Gu

@jgu32.bsky.social

At inference time, this joint distribution can be leveraged to estimate conditional distributions, such as P (XYZ | RGB) or P (RGB | XYZ). This capability makes WVD a foundation for supporting a wide range of downstream tasks. (4/n)

December 4, 2024 at 1:41 PM

Jiatao Gu

@jgu32.bsky.social

During training, WVD learns to generate 6D (RGB + XYZ) videos by modeling the joint probability P (RGB, XYZ), effectively capturing their interdependent structures and features. (3/n)

December 4, 2024 at 1:41 PM

Jiatao Gu

@jgu32.bsky.social

Existing multi-view/video diffusion model usually lack explicit 3D supervision (or guarantee), leading to potential 3D inconsistency and inefficient training.

In contrast, WVD models multi-view images, and explicit 3D geometry. Specifically, we represent the 3D geometry via XYZ images. (2/n)

December 4, 2024 at 1:41 PM

Jiatao Gu

@jgu32.bsky.social

🤔Image-to-3D, monocular depth estimation, camera pose estimation, …, can we achieve all of this with just ONE model easily?

🚀Our answer is Yes -- Excited to introduce our latest work: World-consistent Video Diffusion (WVD) with Explicit 3D Modeling!

arxiv.org/abs/2412.01821

December 4, 2024 at 1:41 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news