Max Seitzer
maxseitzer.bsky.social
Max Seitzer
@maxseitzer.bsky.social
Research Scientist in the DINO team at Meta FAIR. Previously: PhD at Max-Planck Institute for Intelligent Systems, Tübingen. Representation learning, agents, structure.
Immensely proud to have been part of this project. Thank you to the team: @oriane_simeoni, @huyvvo, @baldassarrefe.bsky.social, Maxime Oquab, Cijo Jose, Vasil Khalidov, Marc Szafraniec, Seungeun Yi, Michael Ramamonjisoa, Francisco Massa, Daniel Haziza, Luca Wehrstedt, Jianyuan Wang, …
August 14, 2025 at 6:52 PM
And here’s my favorite figure from the paper, showing high resolution DINOv3 representations in all their detail-capturing glory ✨
August 14, 2025 at 6:52 PM
To recap:

1) The promise of SSL is finally realized, enabling foundation models across domains
2) High quality dense features enabling SotA applications
3) A versatile family of models for diverse deploy scenarios

So many great ideas (Gram anchoring!) to how we got there, please read the paper!
August 14, 2025 at 6:52 PM
Satellite you said? Yes, the same DINOv3 algorithm trained on satellite imagery produces a SotA model for geospatial tasks like canopy height estimation. And of course, learns beautiful feature maps. This is the magic of SSL 🪄
August 14, 2025 at 6:52 PM
3) DINOv3 is a family of models covering all use cases:

• ViT-7B flagship model
• ViT-S/S+/B/L/H+ (21M-840M params)
• ConvNeXt variants for efficient inference
• Text-aligned ViT-L (dino.txt)
• ViT-L/7B for satellite

All inheriting the great dense features of the 7B!
August 14, 2025 at 6:52 PM
Well, Jianyuan Wang of VGGT fame simply dropped DINOv3 into his pipeline and off-handedly got a new SotA 3D model out. Seems promising enough?
August 14, 2025 at 6:52 PM
2) DINOv3’s global understanding is strong, but its dense representations truly shine! There’s a clear gap between DINOv3 and prior methods across many tasks. This matters as pretrained dense features power many applications: MLLMs, video&3D understanding, robotics, generative models, …
August 14, 2025 at 6:52 PM
1) Some history: on ImageNet classification, supervised and weakly-supervised models converged to the same plateau over the last years. With DINOv3, SSL finally reaches that level. This alone is a big deal: no more reliance on annotated data!
August 14, 2025 at 6:52 PM
Introducing DINOv3 🦕🦕🦕

A SotA-enabling vision foundation model, trained with pure self-supervised learning (SSL) at scale.
High quality dense features, combining unprecedented semantic and geometric scene understanding.

Three reasons why this matters👇
August 14, 2025 at 6:52 PM