Alexandre Morgand, PhD
alexmrgd.bsky.social
Alexandre Morgand, PhD
@alexmrgd.bsky.social
Computer Vision Research Scientist at @Simulon , music lover, fond of scientific/musical/geeky/useless stuff
"No Pose at All Self-Supervised Pose-Free 3DGS from Sparse Views"
TLDR: 3DGS + no poses during training/inference; shared feature extraction backbone; simultaneous prediction of 3D Gaussian primitives+camera poses in a canonical space from unposed (1 feed-forward step).
August 7, 2025 at 3:49 PM
"Any-to-Bokeh: One-Step Video Bokeh via Multi-Plane Image Guided Diffusion"

📖TL;DR: Any-to-Bokeh is a novel one-step video bokeh framework that converts arbitrary input videos into temporally coherent, depth-aware bokeh effects.
June 13, 2025 at 1:43 PM
"QUEEN: QUantized Efficient ENcoding of Dynamic Gaussians for Streaming Free-viewpoint Videos"

TL;DR: Streamable free-viewpoint videos efficient representations for with dynamic Gaussians. Reduce model size to just 0.7 MB per frame while training in < 5s and rendering at 350 FPS
June 11, 2025 at 9:13 AM
"STORM: Spatio-Temporal Reconstruction Model for Large-Scale Outdoor Scenes"

TL;DR: Data driven transformer in a feed forward manner; dense reconstruction in dynamic environment with 3D gaussians and velocities; self-supervised scene flows
May 20, 2025 at 4:49 PM
St4RTrack: Simultaneous 4D Reconstruction and Tracking in the World

TL;DR: a feed-forward; (reconstructs+tracks dynamic video content); dust3r-like pointmaps for a pair of frames captured at different moments (1/2)
April 22, 2025 at 4:30 PM
FLARE: Feed-forward Geometry, Appearance and Camera Estimation from Uncalibrated Sparse Views

TL;DR: feed-forward model; cascaded learning paradigm with camera pose serving as the critical bridge, recognizing its essential role in mapping 3D structures onto 2D image planes.
March 14, 2025 at 10:21 AM
⚡️Fast3R: Towards 3D Reconstruction of 1000+ Images in One Forward Pass

TL;DR: multi-view generalization to DUSt3R; processing many views in parallel: Transformer-based architecture forwards N images in a single forward pass, bypassing the need for iterative alignment.
March 13, 2025 at 10:07 AM
🪄 VACE: All-in-One Video Creation and Editing

from @alibabagroup.bsky.social's Tongyi Lab with:

Zeyinzi Jiang* Zhen Han* Chaojie Mao*† Jingfeng Zhang Yulin Pan Yu Liu

*Equal contribution, †Project lead
March 12, 2025 at 8:39 AM
Difix3D+: Improving 3D Reconstructions with Single-Step Diffusion Models

TL;DR: single-step diffusion models; a single-step image diffusion model trained to enhance and remove artifacts in rendered novel views caused by underconstrained regions of the 3D representation.
March 10, 2025 at 8:43 AM
A Distractor-Aware Memory (DAM) for Visual Object Tracking with SAM2

TL;DR: SAM2.1 based; distractor-distilled (DiDi) dataset to better study the distractor problem
March 5, 2025 at 8:48 AM
CAST: Component-Aligned 3D Scene Reconstruction from an RGB Image

TL;DR: object-level 2D segmentation+relative depth; GPT-based model to analyze inter-object spatial relationships; occlusion-aware large-scale 3D generation model
March 4, 2025 at 8:41 AM
Are diffusion models falling for optical illusion?

"The Art of Deception: Color Visual Illusions and Diffusion Models"

TL;DR: Diffusion models exhibit human-like perceptual shifts in brightness and color within their latent space.
February 28, 2025 at 3:15 PM
Does 3D Gaussian Splatting Need Accurate Volumetric Rendering?

TL;DR: While more accurate volumetric rendering can help for low numbers of primitives, efficient optimization + large number of Gaussians allows 3DGS to outperform volumetric rendering despite its approximations
February 27, 2025 at 9:58 AM
The NeRF-life vengeance?

"Sparse Voxels Rasterization: Real-time High-fidelity Radiance Field Rendering"
February 24, 2025 at 10:09 AM
Self-Calibrating Gaussian Splatting for Large Field of View Reconstruction

TL;DR: Self calibration + cubemap-based resampling strategy to support large FOV images
February 20, 2025 at 8:50 AM
Animate Anyone 2: High-Fidelity Character Image Animation with Environment Affordance

TL;DR: motion from source video + capture environmental representations as conditional inputs. Shape-agnostic mask strategy for character/environment relationship .
February 18, 2025 at 4:26 PM
Pippo : High-Resolution Multi-View Humans from a Single Image

TL;DR: 1K Multiview Diffusion Transformer pre-trained on 3B Human images without captions; post-trained on 2.5K studio captures with pixel-aligned control via ControlMLP; generates > 5x views at inference
February 18, 2025 at 10:16 AM
Since 2024, it's crazy how competitive the field of generative video is. Here is another player but open source this time!

Hong Kong University and ByteDance present "Goku: Flow Based Video Generative Foundation Models"
February 14, 2025 at 9:21 AM
📜 Fillerbuster: Multi-View Scene Completion for Casual Captures

TL;DR: Unified framework for scene completion; joint models images and camera poses estimation to reconstruct missing parts of casually captured scenes. 1B-parameter diffusion model from scratch.
February 12, 2025 at 9:02 AM
Diffusion as Shader: 3D-aware Video Diffusion for Versatile Video Generation Control

TL;DR: Manipulating 3D tracking videos; link frames, significantly enhancing for temporal consistency of the generated videos; 3 days oftraining on 8 H800 GPUs using less than 10k videos
February 11, 2025 at 8:42 AM
Zero-Shot Novel View and Depth Synthesis with Multi-View Geometric Diffusion

TL;DR: diffusion-based; raymap conditioning to both augment visual features with spatial information from different viewpoints; multi-task generation of images and depth maps
February 4, 2025 at 9:47 AM
DiffVSR Enhancing Real-World Video Super-Resolution with Diffusion Models for Advanced Visual Quality and Temporal Consistency

TL;DR: multi-scale temporal attention module for spatial accuracy. Noise rescheduling mechanism & latent transition approach for temporal consistency
February 3, 2025 at 11:10 AM
CubeDiff: Repurposing Diffusion-Based Image Models for Panorama Generation

TL;DR: 360° panoramas using diffusion-based image models. cubemap representations + fine-tuning pretrained txt2img models, CubeDiff simplifies the panorama generation process, delivering high-quality, consistent panoramas.
January 31, 2025 at 10:03 AM
BLADE: Single-view Body Mesh Learning through Accurate Depth Estimation

TL;DR: fully perspective projection model without applying heuristics; depth, focal parameters, 3D pose, and 2D alignment estimation
January 29, 2025 at 8:25 AM
Continuous 3D Perception Model with Persistent State

TL;DR: An online 3D reasoning framework for various 3D tasks from only RGB inputs
January 27, 2025 at 9:37 AM