Chris Hoang
choang.bsky.social
Chris Hoang
@choang.bsky.social
PhD student @agentic-ai-lab.bsky.social • prev Meta FAIR, Voleon Group, @umich.edu CS

chrishoang.com
We visualize motion latents by perturbing a spatial feature (green square) and forward predicting to propagate the perturbation.

Feature similarity between the propagated and original perturbation intuitively matches object correspondence!

We can also repeat this over multiple frames for tracking
January 30, 2026 at 6:57 PM
and more of semantic segmentation and optical flow!
January 30, 2026 at 6:57 PM
Video visualization of semantic segmentation...
January 30, 2026 at 6:57 PM
We find that Midway Network is the only model that performs well on both semantic segmentation and optical flow tasks.

It even outperforms our prior work PooDLe (bsky.app/profile/meng...) without needing an external optical flow network!
January 30, 2026 at 6:57 PM
Forward predictions at higher feature levels are used to infer motion latents at lower levels, motivated by iterative refinement in optical flow methods (e.g. PWCNet, UFlow).

We also introduce learnable gating units on residual paths of forward predictors to remove bias towards the identity mapping
January 30, 2026 at 6:57 PM
Midway Network centers around a midway top-down path that infers motion latents between video frames (inverse dynamics).

It predicts future dense latent features from visual encoders, conditioned on these motion latents (forward dynamics).
January 30, 2026 at 6:57 PM
Neuroscience theory (e.g. Wolpert et al. 1998) suggests that animals use future prediction and dynamics modeling for perception and control.

Inspired by this, we asked: can latent dynamics modeling learn useful representations of visual observations and their transformations over time, i.e. motion?
January 30, 2026 at 6:57 PM