Lightnews — Scholar-powered news

Pedro Velez

@pdvelez.bsky.social

12 followers 12 following 0 posts

Research Engineer
Google Deepmind

Posts Replies Media Videos

Reposted by Pedro Velez

Mehdi S. M. Sajjadi

@msajjadi.com

Scaling 4D Representations

Self-supervised learning from video does scale! In our latest work, we scaled masked auto-encoding models to 22B params, boosting performance on pose estimation, tracking & more.

Paper: arxiv.org/abs/2412.15212
Code & models: github.com/google-deepmind/representations4d

July 10, 2025 at 11:52 AM

Reposted by Pedro Velez

carldoersch.bsky.social

@carldoersch.bsky.social

We're very excited to introduce TAPNext: a model that sets a new state-of-art for Tracking Any Point in videos, by formulating the task as Next Token Prediction. For more, see: tap-next.github.io

April 9, 2025 at 2:04 PM

Reposted by Pedro Velez

Mehdi S. M. Sajjadi

@msajjadi.com

Generative Video Diffusion: does a model trained with this objective learn better features compared to image generation?

We investigated this question and more in our latest work, please check it out!

*From Image to Video: An Empirical Study of Diffusion Representations*
arxiv.org/abs/2502.07001

Video vs. image diffusion representations

Feature visualization for image and video diffusion

February 13, 2025 at 4:11 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news