Previously @UofT, @NVIDIAAI, @samsungresearch
Opinions are mine.
http://ashmrz.github.io
Have you already taken off or still around for a bit?
Have you already taken off or still around for a bit?
🌐 Project page: snap-research.github.io/4Real-Video-V2
📜 Abstract: arxiv.org/abs/2506.18839
🌐 Project page: snap-research.github.io/4Real-Video-V2
📜 Abstract: arxiv.org/abs/2506.18839
*equal contribution
*equal contribution
Our feedforward model takes RGB frames and predicts camera poses and dynamic 3D Gaussians. No optimization loops. No ground-truth poses. Just fast, clean reconstruction.
Our feedforward model takes RGB frames and predicts camera poses and dynamic 3D Gaussians. No optimization loops. No ground-truth poses. Just fast, clean reconstruction.
We fuse spatial/temporal attentions into a transformer layer. This view-time attention lets our diffusion model reason across viewpoints and frames jointly, without extra parameters. Parameter-efficiency also leads to more stability.
We fuse spatial/temporal attentions into a transformer layer. This view-time attention lets our diffusion model reason across viewpoints and frames jointly, without extra parameters. Parameter-efficiency also leads to more stability.
Toshiya Yura, @ashmrz.bsky.social, Igor Gilitschenski 5/🧵
arxiv.org/abs/2412.07293
Toshiya Yura, @ashmrz.bsky.social, Igor Gilitschenski 5/🧵
arxiv.org/abs/2412.07293