Lightnews — Scholar-powered news

Chris Hoang

@choang.bsky.social

7 followers 60 following 9 posts

PhD student @agentic-ai-lab.bsky.social • prev Meta FAIR, Voleon Group, @umich.edu CS

chrishoang.com

Posts Replies Media Videos

Chris Hoang

@choang.bsky.social

Many thanks to my advisor @mengyer.bsky.social for his guidance on this project!

Paper: arxiv.org/abs/2510.05558
Website: agenticlearning.ai/midway-network
Code: github.com/agentic-lear...

Midway Network: Learning Representations for Recognition and Motion from Latent Dynamics

Object recognition and motion understanding are key components of perception that complement each other. While self-supervised learning methods have shown promise in their ability to learn from unlabe...

arxiv.org

January 30, 2026 at 6:57 PM

Chris Hoang

@choang.bsky.social

We visualize motion latents by perturbing a spatial feature (green square) and forward predicting to propagate the perturbation.

Feature similarity between the propagated and original perturbation intuitively matches object correspondence!

We can also repeat this over multiple frames for tracking

January 30, 2026 at 6:57 PM

Chris Hoang

@choang.bsky.social

and more of semantic segmentation and optical flow!

January 30, 2026 at 6:57 PM

Chris Hoang

@choang.bsky.social

Video visualization of semantic segmentation...

January 30, 2026 at 6:57 PM

Chris Hoang

@choang.bsky.social

We find that Midway Network is the only model that performs well on both semantic segmentation and optical flow tasks.

It even outperforms our prior work PooDLe (bsky.app/profile/meng...) without needing an external optical flow network!

January 30, 2026 at 6:57 PM

Chris Hoang

@choang.bsky.social

Forward predictions at higher feature levels are used to infer motion latents at lower levels, motivated by iterative refinement in optical flow methods (e.g. PWCNet, UFlow).

We also introduce learnable gating units on residual paths of forward predictors to remove bias towards the identity mapping

January 30, 2026 at 6:57 PM

Chris Hoang

@choang.bsky.social

Midway Network centers around a midway top-down path that infers motion latents between video frames (inverse dynamics).

It predicts future dense latent features from visual encoders, conditioned on these motion latents (forward dynamics).

January 30, 2026 at 6:57 PM

Chris Hoang

@choang.bsky.social

Neuroscience theory (e.g. Wolpert et al. 1998) suggests that animals use future prediction and dynamics modeling for perception and control.

Inspired by this, we asked: can latent dynamics modeling learn useful representations of visual observations and their transformations over time, i.e. motion?

January 30, 2026 at 6:57 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news