Sophia Sirko-Galouchenko 🇺🇦
@ssirko.bsky.social
PhD student in visual representation learning at Valeo.ai and Sorbonne Université (MLIA)
In our paper DIP, we use DiffCut to generate segmentation pseudo-labels - the masks are very high-fidelity, which greatly boosts supervision quality 👏
November 5, 2025 at 6:42 PM
In our paper DIP, we use DiffCut to generate segmentation pseudo-labels - the masks are very high-fidelity, which greatly boosts supervision quality 👏
Work done in collaboration with
@spyrosgidaris.bsky.social @vobeckya.bsky.social @abursuc.bsky.social and Nicolas Thome
Paper: arxiv.org/abs/2506.18463
Github: github.com/sirkosophia...
@spyrosgidaris.bsky.social @vobeckya.bsky.social @abursuc.bsky.social and Nicolas Thome
Paper: arxiv.org/abs/2506.18463
Github: github.com/sirkosophia...
GitHub - sirkosophia/DIP: Official implementation of DIP: Unsupervised Dense In-Context Post-training of Visual Representations
Official implementation of DIP: Unsupervised Dense In-Context Post-training of Visual Representations - sirkosophia/DIP
github.com
June 25, 2025 at 7:21 PM
Work done in collaboration with
@spyrosgidaris.bsky.social @vobeckya.bsky.social @abursuc.bsky.social and Nicolas Thome
Paper: arxiv.org/abs/2506.18463
Github: github.com/sirkosophia...
@spyrosgidaris.bsky.social @vobeckya.bsky.social @abursuc.bsky.social and Nicolas Thome
Paper: arxiv.org/abs/2506.18463
Github: github.com/sirkosophia...
6/n Benefits 💪
- < 9h on a single A100 gpu.
- Improves across 6 segmentation benchmarks
- Boosts performance for in-context depth prediction.
- Plug-and-play for different ViTs: DINOv2, CLIP, MAE.
- Robust in low-shot and domain shift.
- < 9h on a single A100 gpu.
- Improves across 6 segmentation benchmarks
- Boosts performance for in-context depth prediction.
- Plug-and-play for different ViTs: DINOv2, CLIP, MAE.
- Robust in low-shot and domain shift.
June 25, 2025 at 7:21 PM
6/n Benefits 💪
- < 9h on a single A100 gpu.
- Improves across 6 segmentation benchmarks
- Boosts performance for in-context depth prediction.
- Plug-and-play for different ViTs: DINOv2, CLIP, MAE.
- Robust in low-shot and domain shift.
- < 9h on a single A100 gpu.
- Improves across 6 segmentation benchmarks
- Boosts performance for in-context depth prediction.
- Plug-and-play for different ViTs: DINOv2, CLIP, MAE.
- Robust in low-shot and domain shift.
5/n Why is DIP unsupervised?
DIP doesn't require manually annotated segmentation masks for its post-training. To accomplish this, it leverages Stable Diffusion (via DiffCut) alongside DINOv2R features to automatically construct in-context pseudo-tasks for its post-training.
DIP doesn't require manually annotated segmentation masks for its post-training. To accomplish this, it leverages Stable Diffusion (via DiffCut) alongside DINOv2R features to automatically construct in-context pseudo-tasks for its post-training.
June 25, 2025 at 7:21 PM
5/n Why is DIP unsupervised?
DIP doesn't require manually annotated segmentation masks for its post-training. To accomplish this, it leverages Stable Diffusion (via DiffCut) alongside DINOv2R features to automatically construct in-context pseudo-tasks for its post-training.
DIP doesn't require manually annotated segmentation masks for its post-training. To accomplish this, it leverages Stable Diffusion (via DiffCut) alongside DINOv2R features to automatically construct in-context pseudo-tasks for its post-training.
4/n Meet Dense In-context Post-training (DIP)! 🔄
- Meta-learning inspired: adopts episodic training principles
- Task-aligned: Explicitly mimics downstream dense in-context tasks during post-training.
- Purpose-built: Optimizes the model for dense in-context performance.
- Meta-learning inspired: adopts episodic training principles
- Task-aligned: Explicitly mimics downstream dense in-context tasks during post-training.
- Purpose-built: Optimizes the model for dense in-context performance.
June 25, 2025 at 7:21 PM
4/n Meet Dense In-context Post-training (DIP)! 🔄
- Meta-learning inspired: adopts episodic training principles
- Task-aligned: Explicitly mimics downstream dense in-context tasks during post-training.
- Purpose-built: Optimizes the model for dense in-context performance.
- Meta-learning inspired: adopts episodic training principles
- Task-aligned: Explicitly mimics downstream dense in-context tasks during post-training.
- Purpose-built: Optimizes the model for dense in-context performance.
3/n Most unsupervised (post-)training methods for dense in-context scene understanding rely on self-distillation frameworks with (somewhat) complicated objectives and network components. Hard to interpret, tricky to tune.
Is there a simpler alternative? 👀
Is there a simpler alternative? 👀
June 25, 2025 at 7:21 PM
3/n Most unsupervised (post-)training methods for dense in-context scene understanding rely on self-distillation frameworks with (somewhat) complicated objectives and network components. Hard to interpret, tricky to tune.
Is there a simpler alternative? 👀
Is there a simpler alternative? 👀
2/n What is dense in-context scene understanding?
Formulate dense prediction tasks as nearest-neighbor retrieval problems using patch feature similarities between query and the labeled prompt images (introduced in @ibalazevic.bsky.social et al.’s HummingBird; figure below from their work).
Formulate dense prediction tasks as nearest-neighbor retrieval problems using patch feature similarities between query and the labeled prompt images (introduced in @ibalazevic.bsky.social et al.’s HummingBird; figure below from their work).
June 25, 2025 at 7:21 PM
2/n What is dense in-context scene understanding?
Formulate dense prediction tasks as nearest-neighbor retrieval problems using patch feature similarities between query and the labeled prompt images (introduced in @ibalazevic.bsky.social et al.’s HummingBird; figure below from their work).
Formulate dense prediction tasks as nearest-neighbor retrieval problems using patch feature similarities between query and the labeled prompt images (introduced in @ibalazevic.bsky.social et al.’s HummingBird; figure below from their work).