Kwang Moo Yi
@kmyid.bsky.social
Assistant Professor of Computer Science at the University of British Columbia. I also post my daily finds on arxiv.
Ren and Wen et al., "FastGS: Training 3D Gaussian Splatting in 100 Seconds"
I like simple ideas -- this one says you should consider multiple views when you prune/clone, which allows fewer Gaussians to be used for training.
I like simple ideas -- this one says you should consider multiple views when you prune/clone, which allows fewer Gaussians to be used for training.
November 7, 2025 at 6:32 PM
Ren and Wen et al., "FastGS: Training 3D Gaussian Splatting in 100 Seconds"
I like simple ideas -- this one says you should consider multiple views when you prune/clone, which allows fewer Gaussians to be used for training.
I like simple ideas -- this one says you should consider multiple views when you prune/clone, which allows fewer Gaussians to be used for training.
Gao and Mao et al., "Seeing the Wind from a Falling Leaf"
Extract Dynamic 3D Gaussians for an object -> Vision Language Models to extract physics parameters -> model force field (wind). Leads to some fun.
Extract Dynamic 3D Gaussians for an object -> Vision Language Models to extract physics parameters -> model force field (wind). Leads to some fun.
November 5, 2025 at 5:31 PM
Gao and Mao et al., "Seeing the Wind from a Falling Leaf"
Extract Dynamic 3D Gaussians for an object -> Vision Language Models to extract physics parameters -> model force field (wind). Leads to some fun.
Extract Dynamic 3D Gaussians for an object -> Vision Language Models to extract physics parameters -> model force field (wind). Leads to some fun.
Zhou et al., "PAGE-4D: Disentangled Pose and Geometry Estimation for 4D Perception"
VGGT extended to dynamic scenes with a dynamic mask predictor.
VGGT extended to dynamic scenes with a dynamic mask predictor.
November 4, 2025 at 8:17 PM
Zhou et al., "PAGE-4D: Disentangled Pose and Geometry Estimation for 4D Perception"
VGGT extended to dynamic scenes with a dynamic mask predictor.
VGGT extended to dynamic scenes with a dynamic mask predictor.
Tesfaldet et al., "Generative Point Tracking with Flow Matching"
Tracking, waaaaaay back in the days, used to be solved using sampling methods. They are now back. Also reminds me of my first major conference work, where I looked into how much impact the initial target point has.
Tracking, waaaaaay back in the days, used to be solved using sampling methods. They are now back. Also reminds me of my first major conference work, where I looked into how much impact the initial target point has.
October 31, 2025 at 6:42 PM
Tesfaldet et al., "Generative Point Tracking with Flow Matching"
Tracking, waaaaaay back in the days, used to be solved using sampling methods. They are now back. Also reminds me of my first major conference work, where I looked into how much impact the initial target point has.
Tracking, waaaaaay back in the days, used to be solved using sampling methods. They are now back. Also reminds me of my first major conference work, where I looked into how much impact the initial target point has.
Bai et al., "Positional Encoding Field"
Make your RoPE encoding 3D by including a z axis, then manipulate your image by simply manipulating your positional encoding in 3D --> novel view synthesis. Neat idea.
Make your RoPE encoding 3D by including a z axis, then manipulate your image by simply manipulating your positional encoding in 3D --> novel view synthesis. Neat idea.
October 24, 2025 at 6:20 PM
Bai et al., "Positional Encoding Field"
Make your RoPE encoding 3D by including a z axis, then manipulate your image by simply manipulating your positional encoding in 3D --> novel view synthesis. Neat idea.
Make your RoPE encoding 3D by including a z axis, then manipulate your image by simply manipulating your positional encoding in 3D --> novel view synthesis. Neat idea.
Choudhury and Kim et al., "Accelerating Vision Transformers With Adaptive Patch Sizes"
Transformer patches don't need to be of uniform size -- choose sizes based on entropy --> faster training/inference. Are scale-spaces gonna make a comeback?
Transformer patches don't need to be of uniform size -- choose sizes based on entropy --> faster training/inference. Are scale-spaces gonna make a comeback?
October 22, 2025 at 8:08 PM
Choudhury and Kim et al., "Accelerating Vision Transformers With Adaptive Patch Sizes"
Transformer patches don't need to be of uniform size -- choose sizes based on entropy --> faster training/inference. Are scale-spaces gonna make a comeback?
Transformer patches don't need to be of uniform size -- choose sizes based on entropy --> faster training/inference. Are scale-spaces gonna make a comeback?
Hakie and Lu et al., "Fix False Transparency by Noise Guided Splatting"
Pretty cute idea -- Gaussian splats are often transparent, although we don't want them to be. So, just fill your splats in with noise during optimization to make them non-transparent.
Pretty cute idea -- Gaussian splats are often transparent, although we don't want them to be. So, just fill your splats in with noise during optimization to make them non-transparent.
October 20, 2025 at 9:35 PM
Hakie and Lu et al., "Fix False Transparency by Noise Guided Splatting"
Pretty cute idea -- Gaussian splats are often transparent, although we don't want them to be. So, just fill your splats in with noise during optimization to make them non-transparent.
Pretty cute idea -- Gaussian splats are often transparent, although we don't want them to be. So, just fill your splats in with noise during optimization to make them non-transparent.
Alzayer et al., "Coupled Diffusion Sampling for Training-Free Multi-View Image Editing"
You can "guide" diffusion models with different purposes by "coupling them". Our group did simply weighted averaging without math in vivid-123, but this is much more sound!
You can "guide" diffusion models with different purposes by "coupling them". Our group did simply weighted averaging without math in vivid-123, but this is much more sound!
October 17, 2025 at 7:00 PM
Alzayer et al., "Coupled Diffusion Sampling for Training-Free Multi-View Image Editing"
You can "guide" diffusion models with different purposes by "coupling them". Our group did simply weighted averaging without math in vivid-123, but this is much more sound!
You can "guide" diffusion models with different purposes by "coupling them". Our group did simply weighted averaging without math in vivid-123, but this is much more sound!
Bruns et al., "ACE-G: Improving Generalization of Scene Coordinate Regression Through Query Pre-Training"
Train a scene coordinate regressor with "map codes" (ie, trainable inputs) so that you can train one generalizable regressor. Then, find these "map codes" to localize.
Train a scene coordinate regressor with "map codes" (ie, trainable inputs) so that you can train one generalizable regressor. Then, find these "map codes" to localize.
October 16, 2025 at 7:37 PM
Bruns et al., "ACE-G: Improving Generalization of Scene Coordinate Regression Through Query Pre-Training"
Train a scene coordinate regressor with "map codes" (ie, trainable inputs) so that you can train one generalizable regressor. Then, find these "map codes" to localize.
Train a scene coordinate regressor with "map codes" (ie, trainable inputs) so that you can train one generalizable regressor. Then, find these "map codes" to localize.
Shrivastava and Mehta et al., "Point Prompting: Counterfactual Tracking with Video Diffusion Models"
Put a red dot where you want to track, and SDEdit the video with a video model --> zero-shot point tracking. Not as good as supervised ones, but zero-shot!
Put a red dot where you want to track, and SDEdit the video with a video model --> zero-shot point tracking. Not as good as supervised ones, but zero-shot!
October 15, 2025 at 6:39 PM
Shrivastava and Mehta et al., "Point Prompting: Counterfactual Tracking with Video Diffusion Models"
Put a red dot where you want to track, and SDEdit the video with a video model --> zero-shot point tracking. Not as good as supervised ones, but zero-shot!
Put a red dot where you want to track, and SDEdit the video with a video model --> zero-shot point tracking. Not as good as supervised ones, but zero-shot!
Xu et al., "ReSplat: Learning Recurrent Gaussian Splats"
Feed-forward Gaussian Splatting + Learned Corrector = Fast high-quality reconstruction. Uses global + kNN attention. Reminds me of pointnet++
Feed-forward Gaussian Splatting + Learned Corrector = Fast high-quality reconstruction. Uses global + kNN attention. Reminds me of pointnet++
October 10, 2025 at 7:23 PM
Xu et al., "ReSplat: Learning Recurrent Gaussian Splats"
Feed-forward Gaussian Splatting + Learned Corrector = Fast high-quality reconstruction. Uses global + kNN attention. Reminds me of pointnet++
Feed-forward Gaussian Splatting + Learned Corrector = Fast high-quality reconstruction. Uses global + kNN attention. Reminds me of pointnet++
Xu and Lin et al., "Pixel-Perfect Depth with Semantics-Prompted Diffusion Transformers"
Append foundational features at the later stages when doing marigold-like denoising to get monocular depth. Simple straightforward idea that works.
Append foundational features at the later stages when doing marigold-like denoising to get monocular depth. Simple straightforward idea that works.
October 9, 2025 at 8:14 PM
Xu and Lin et al., "Pixel-Perfect Depth with Semantics-Prompted Diffusion Transformers"
Append foundational features at the later stages when doing marigold-like denoising to get monocular depth. Simple straightforward idea that works.
Append foundational features at the later stages when doing marigold-like denoising to get monocular depth. Simple straightforward idea that works.
Chen et al., "TTT3R: 3D Reconstruction as Test-Time Training"
Cut3R + gated updates for states (test-time training layers) = fast/efficient performance of cut3r, but with high-quality estimates.
Cut3R + gated updates for states (test-time training layers) = fast/efficient performance of cut3r, but with high-quality estimates.
October 6, 2025 at 5:27 PM
Chen et al., "TTT3R: 3D Reconstruction as Test-Time Training"
Cut3R + gated updates for states (test-time training layers) = fast/efficient performance of cut3r, but with high-quality estimates.
Cut3R + gated updates for states (test-time training layers) = fast/efficient performance of cut3r, but with high-quality estimates.