Previously @UofT, @NVIDIAAI, @samsungresearch
Opinions are mine.
http://ashmrz.github.io
Our feedforward model takes RGB frames and predicts camera poses and dynamic 3D Gaussians. No optimization loops. No ground-truth poses. Just fast, clean reconstruction.
Our feedforward model takes RGB frames and predicts camera poses and dynamic 3D Gaussians. No optimization loops. No ground-truth poses. Just fast, clean reconstruction.
We fuse spatial/temporal attentions into a transformer layer. This view-time attention lets our diffusion model reason across viewpoints and frames jointly, without extra parameters. Parameter-efficiency also leads to more stability.
We fuse spatial/temporal attentions into a transformer layer. This view-time attention lets our diffusion model reason across viewpoints and frames jointly, without extra parameters. Parameter-efficiency also leads to more stability.
snap-research.github.io/4Real-Video-...
snap-research.github.io/4Real-Video-...