Lightnews — Scholar-powered news

Sophia Sirko-Galouchenko 🇺🇦

@ssirko.bsky.social

210 followers 330 following 12 posts

PhD student in visual representation learning at Valeo.ai and Sorbonne Université (MLIA)

Posts Replies Media Videos

Sophia Sirko-Galouchenko 🇺🇦

@ssirko.bsky.social

In our paper DIP, we use DiffCut to generate segmentation pseudo-labels - the masks are very high-fidelity, which greatly boosts supervision quality 👏

November 5, 2025 at 6:42 PM

Sophia Sirko-Galouchenko 🇺🇦

@ssirko.bsky.social

Yesterday was PhD defense day for @paulcouairon.bsky.social at @mlia-isir.bsky.social 🎓

His thesis focused on structured visual representations:
• VidEdit - zero-shot text-to-video editing
• DiffCut - zero-shot segmentation via diffusion features
• JAFAR - high-res visual representation upsampling

November 5, 2025 at 6:42 PM

Sophia Sirko-Galouchenko 🇺🇦

@ssirko.bsky.social

Thrilled to present DIP at #ICCV2025! Great discussions and insightful questions during the poster session. Thank you to everyone who stopped by!

October 24, 2025 at 1:46 AM

Sophia Sirko-Galouchenko 🇺🇦

@ssirko.bsky.social

Happy to represent Ukraine at #ICCV2025 . Come see my poster today at 11:45 (#399)!

October 21, 2025 at 7:35 PM

Sophia Sirko-Galouchenko 🇺🇦

@ssirko.bsky.social

6/n Benefits 💪
- < 9h on a single A100 gpu.
- Improves across 6 segmentation benchmarks
- Boosts performance for in-context depth prediction.
- Plug-and-play for different ViTs: DINOv2, CLIP, MAE.
- Robust in low-shot and domain shift.

June 25, 2025 at 7:21 PM

Sophia Sirko-Galouchenko 🇺🇦

@ssirko.bsky.social

5/n Why is DIP unsupervised?

DIP doesn't require manually annotated segmentation masks for its post-training. To accomplish this, it leverages Stable Diffusion (via DiffCut) alongside DINOv2R features to automatically construct in-context pseudo-tasks for its post-training.

June 25, 2025 at 7:21 PM

Sophia Sirko-Galouchenko 🇺🇦

@ssirko.bsky.social

4/n Meet Dense In-context Post-training (DIP)! 🔄

- Meta-learning inspired: adopts episodic training principles
- Task-aligned: Explicitly mimics downstream dense in-context tasks during post-training.
- Purpose-built: Optimizes the model for dense in-context performance.

June 25, 2025 at 7:21 PM

Sophia Sirko-Galouchenko 🇺🇦

@ssirko.bsky.social

2/n What is dense in-context scene understanding?

Formulate dense prediction tasks as nearest-neighbor retrieval problems using patch feature similarities between query and the labeled prompt images (introduced in @ibalazevic.bsky.social‬ et al.’s HummingBird; figure below from their work).

June 25, 2025 at 7:21 PM

Sophia Sirko-Galouchenko 🇺🇦

@ssirko.bsky.social

1/n 🚀New paper out - accepted at #ICCV2025!

Introducing DIP: unsupervised post-training that enhances dense features in pretrained ViTs for dense in-context scene understanding

Below: Low-shot in-context semantic segmentation examples. DIP features outperform DINOv2!

June 25, 2025 at 7:21 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news