Lightnews — Scholar-powered news

Kwang Moo Yi

@kmyid.bsky.social

Wang et al., "Seeds of Structure: Patch PCA Reveals Universal Compositional Cues in Diffusion Models"

The existence of single (few) step denoisers, and many recent works hinted at this, but another one. You can decode the image structure from the initial noise fairly easily.

November 10, 2025 at 7:42 PM

Kwang Moo Yi

@kmyid.bsky.social

Ren and Wen et al., "FastGS: Training 3D Gaussian Splatting in 100 Seconds"

I like simple ideas -- this one says you should consider multiple views when you prune/clone, which allows fewer Gaussians to be used for training.

November 7, 2025 at 6:32 PM

Kwang Moo Yi

@kmyid.bsky.social

Nguyen et al., "IBGS: Image-Based Gaussian Splatting"

Gaussian Splatting + Image-based rendering (ie, copy things over directly from nearby views). When your Gaussians cannot describe highlights, let your nearby images guide you.

November 6, 2025 at 9:45 PM

Kwang Moo Yi

@kmyid.bsky.social

Gao and Mao et al., "Seeing the Wind from a Falling Leaf"

Extract Dynamic 3D Gaussians for an object -> Vision Language Models to extract physics parameters -> model force field (wind). Leads to some fun.

November 5, 2025 at 5:31 PM

Kwang Moo Yi

@kmyid.bsky.social

Zhou et al., "PAGE-4D: Disentangled Pose and Geometry Estimation for 4D Perception"

VGGT extended to dynamic scenes with a dynamic mask predictor.

November 4, 2025 at 8:17 PM

Kwang Moo Yi

@kmyid.bsky.social

Pfrommer et al., "Is Your Diffusion Model Actually Denoising?"

Apparently no. This indeed matches the empirical behavior of these models that I experienced. These models are approximate, but then what actually is their mathematical property?

November 3, 2025 at 7:28 PM

Kwang Moo Yi

@kmyid.bsky.social

Tesfaldet et al., "Generative Point Tracking with Flow Matching"

Tracking, waaaaaay back in the days, used to be solved using sampling methods. They are now back. Also reminds me of my first major conference work, where I looked into how much impact the initial target point has.

October 31, 2025 at 6:42 PM

Kwang Moo Yi

@kmyid.bsky.social

Stary and Gaubil et al., "Understanding multi-view transformers"

We use Dust3r as a black box. This work looks under the hood at what is going on. The internal representations seem to "iteratively" refine towards the final answer. Quite similar to what goes on in point cloud net

October 30, 2025 at 9:00 PM

Kwang Moo Yi

@kmyid.bsky.social

Goren and Yehezkel et al., "Visual Diffusion Models are Geometric Solvers"

Note: this paper does not claim that diffusion models are better; in fact, specialized models are. Just shows the potential that you can use diffusion models to solve geometric problems.

October 29, 2025 at 9:28 PM

Kwang Moo Yi

@kmyid.bsky.social

Luo et al., "Self-diffusion for Solving Inverse Problems"

Pretty much a deep image prior for denoising models. Without ANY data, with a single image, you can train a denoiser via diffusion training, and it just magically learns to solve inverse problems.

October 27, 2025 at 6:59 PM

Kwang Moo Yi

@kmyid.bsky.social

Bai et al., "Positional Encoding Field"

Make your RoPE encoding 3D by including a z axis, then manipulate your image by simply manipulating your positional encoding in 3D --> novel view synthesis. Neat idea.

October 24, 2025 at 6:20 PM

Kwang Moo Yi

@kmyid.bsky.social

Mao et al., "PoseCrafter: Extreme Pose Estimation with
Hybrid Video Synthesis"

While not perfect, video models do an okay job of creating novel views. Use them to "bridge" between extreme views for pose estimation.

October 23, 2025 at 6:48 PM

Kwang Moo Yi

@kmyid.bsky.social

Choudhury and Kim et al., "Accelerating Vision Transformers With Adaptive Patch Sizes"

Transformer patches don't need to be of uniform size -- choose sizes based on entropy --> faster training/inference. Are scale-spaces gonna make a comeback?

October 22, 2025 at 8:08 PM

Kwang Moo Yi

@kmyid.bsky.social

Riise et al., "Visual Autoregressive Models Beat Diffusion Models on Inference Time Scaling"

Beam search with Autoregressive image generators with verifiers.

October 21, 2025 at 7:42 PM

Kwang Moo Yi

@kmyid.bsky.social

Hakie and Lu et al., "Fix False Transparency by Noise Guided Splatting"

Pretty cute idea -- Gaussian splats are often transparent, although we don't want them to be. So, just fill your splats in with noise during optimization to make them non-transparent.

October 20, 2025 at 9:35 PM

Kwang Moo Yi

@kmyid.bsky.social

Alzayer et al., "Coupled Diffusion Sampling for Training-Free Multi-View Image Editing"

You can "guide" diffusion models with different purposes by "coupling them". Our group did simply weighted averaging without math in vivid-123, but this is much more sound!

October 17, 2025 at 7:00 PM

Kwang Moo Yi

@kmyid.bsky.social

Bruns et al., "ACE-G: Improving Generalization of Scene Coordinate Regression Through Query Pre-Training"

Train a scene coordinate regressor with "map codes" (ie, trainable inputs) so that you can train one generalizable regressor. Then, find these "map codes" to localize.

October 16, 2025 at 7:37 PM

Kwang Moo Yi

@kmyid.bsky.social

Shrivastava and Mehta et al., "Point Prompting: Counterfactual Tracking with Video Diffusion Models"

Put a red dot where you want to track, and SDEdit the video with a video model --> zero-shot point tracking. Not as good as supervised ones, but zero-shot!

October 15, 2025 at 6:39 PM

Kwang Moo Yi

@kmyid.bsky.social

Yuan et al., "LikePhys: Evaluating intuitive physics understanding in video diffusion models via likelihood preference"

I will keep promoting physics benchmark papers for video models until people stop claiming world models :) tl;dr -- Still not there yet.

October 14, 2025 at 6:32 PM

Kwang Moo Yi

@kmyid.bsky.social

Xu et al., "ReSplat: Learning Recurrent Gaussian Splats"

Feed-forward Gaussian Splatting + Learned Corrector = Fast high-quality reconstruction. Uses global + kNN attention. Reminds me of pointnet++

October 10, 2025 at 7:23 PM

Kwang Moo Yi

@kmyid.bsky.social

Xu and Lin et al., "Pixel-Perfect Depth with Semantics-Prompted Diffusion Transformers"

Append foundational features at the later stages when doing marigold-like denoising to get monocular depth. Simple straightforward idea that works.

October 9, 2025 at 8:14 PM

Kwang Moo Yi

@kmyid.bsky.social

Bamberger and Jones et al., "Carré du champ flow matching: better quality-generalisation tradeoff in generative models"

Geometric regularization of the flow manifold. Boils down to adding anisotropic Gaussian Noise to flow matching training. Neat idea, enhances generalization.

October 8, 2025 at 6:27 PM

Kwang Moo Yi

@kmyid.bsky.social

Yugay and Nguyen et al., “Visual Odometry with Transformers”

Instead of point maps, you can also directly output poses. This used to be much less accurate, but now it's the opposite. Simple architecture that directly predicts camera embeddings, which then regress rot and trans.

October 7, 2025 at 4:20 PM

Kwang Moo Yi

@kmyid.bsky.social

Chen et al., "TTT3R: 3D Reconstruction as Test-Time Training"

Cut3R + gated updates for states (test-time training layers) = fast/efficient performance of cut3r, but with high-quality estimates.

October 6, 2025 at 5:27 PM

Kwang Moo Yi

@kmyid.bsky.social

Two today: Kim et al., "How Diffusion Models Memorize" and Song and Kim et al., "Selective Underfitting in Diffusion Models"

A deep dive into how memorization and generalization happen in diffusion models. Still trying to digest what these mean. Though-provoking.

October 3, 2025 at 6:37 PM

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news