Ashkan Mirzaei
ashmrz.bsky.social
Ashkan Mirzaei
@ashmrz.bsky.social
Research Scientist @Snap
Previously @UofT, @NVIDIAAI, @samsungresearch
Opinions are mine.

http://ashmrz.github.io
Super cool work Masha, congrats!
August 7, 2025 at 10:14 PM
Congrats, Kosta! That sounds incredible. Wishing you an amazing year ahead full of great people, new ideas, and exciting experiences.

Have you already taken off or still around for a bit?
July 1, 2025 at 10:31 PM
Super insightful!
June 25, 2025 at 1:43 PM
[9/9] 4Real-Video-V2: Fused View-Time Attention and Feedforward Reconstruction for 4D Scene Generation

🌐 Project page: snap-research.github.io/4Real-Video-V2
📜 Abstract: arxiv.org/abs/2506.18839
4Real-Video-V2: Fused View-Time Attention and Feedforward Reconstruction for 4D Scene Generation
snap-research.github.io
June 24, 2025 at 2:13 PM
[8/9] Authors: Chaoyang Wang*, Ashkan Mirzaei*, Vidit Goel, Willi Menapace, Aliaksandr Siarohin, Avalon Vinella, Michael Vasilkovsky, Ivan Skorokhodov, Vladislav Shakhrai, Sergey Korolev, Sergey Tulyakov, Peter Wonka.
*equal contribution
4Real-Video-V2: Fused View-Time Attention and Feedforward Reconstruction for 4D Scene Generation
snap-research.github.io
June 24, 2025 at 2:13 PM
[7/9] 🧠 We use a camera token replacement trick for temporal consistency of the camera poses, temporal attention layers to share info over time, and a "Gaussian head" to predict shape, scale, opacity, and color offsets.
June 24, 2025 at 2:13 PM
[6/9] 🔁 How it works – Stage 2 (Reconstruction):
Our feedforward model takes RGB frames and predicts camera poses and dynamic 3D Gaussians. No optimization loops. No ground-truth poses. Just fast, clean reconstruction.
June 24, 2025 at 2:13 PM
[5/9] ⚡ The architecture runs on a DiT backbone. Thanks to sparse attention and temporal compression, we keep things efficient. Only self-attention layers are fine-tuned and everything else is frozen.
June 24, 2025 at 2:13 PM
[4/9]🧠How it works – Stage 1 (Generation):
We fuse spatial/temporal attentions into a transformer layer. This view-time attention lets our diffusion model reason across viewpoints and frames jointly, without extra parameters. Parameter-efficiency also leads to more stability.
June 24, 2025 at 2:13 PM
[3/9] High-quality 4D training data is scarce, and large video models are expensive to fine-tune. So we focus on parameter efficiency. Our fused attention design reuses pretrained weights with minimal changes. It trains fast, generalizes well, and scales to full 4D scenes.
June 24, 2025 at 2:13 PM
[2/9] We generate synchronized multi-view video grids, then lift them into 4D geometry using a fast feedforward network. The result is a set of Gaussian particles, ready for rendering, exploration, and editing.
June 24, 2025 at 2:13 PM
Reposted by Ashkan Mirzaei
📹 EventSplat: 3D Gaussian Splatting from Moving Event Cameras for Real-time Rendering
Toshiya Yura, @ashmrz.bsky.social, Igor Gilitschenski 5/🧵
arxiv.org/abs/2412.07293
EventSplat: 3D Gaussian Splatting from Moving Event Cameras for Real-time Rendering
We introduce a method for using event camera data in novel view synthesis via Gaussian Splatting. Event cameras offer exceptional temporal resolution and a high dynamic range. Leveraging these capabil...
arxiv.org
March 3, 2025 at 7:47 PM
Huge congrats Kosta🎉
November 25, 2024 at 7:09 PM