Pedro Velez
pdvelez.bsky.social
Pedro Velez
@pdvelez.bsky.social
Research Engineer
Google Deepmind
Reposted by Pedro Velez
Scaling 4D Representations

Self-supervised learning from video does scale! In our latest work, we scaled masked auto-encoding models to 22B params, boosting performance on pose estimation, tracking & more.

Paper: arxiv.org/abs/2412.15212
Code & models: github.com/google-deepmind/representations4d
July 10, 2025 at 11:52 AM
Reposted by Pedro Velez
We're very excited to introduce TAPNext: a model that sets a new state-of-art for Tracking Any Point in videos, by formulating the task as Next Token Prediction. For more, see: tap-next.github.io
April 9, 2025 at 2:04 PM
Reposted by Pedro Velez
Generative Video Diffusion: does a model trained with this objective learn better features compared to image generation?

We investigated this question and more in our latest work, please check it out!

*From Image to Video: An Empirical Study of Diffusion Representations*
arxiv.org/abs/2502.07001
February 13, 2025 at 4:11 PM