Introducing DIP: unsupervised post-training that enhances dense features in pretrained ViTs for dense in-context scene understanding
Below: Low-shot in-context semantic segmentation examples. DIP features outperform DINOv2!
Introducing DIP: unsupervised post-training that enhances dense features in pretrained ViTs for dense in-context scene understanding
Below: Low-shot in-context semantic segmentation examples. DIP features outperform DINOv2!
Paper : arxiv.org/abs/2506.11136
Project Page: jafar-upsampler.github.io
Github: github.com/PaulCouairon...
Paper : arxiv.org/abs/2506.11136
Project Page: jafar-upsampler.github.io
Github: github.com/PaulCouairon...