tommiekerssies.bsky.social
@tommiekerssies.bsky.social
Built by:
👨‍🔬 Tommie Kerssies, Niccolò Cavagnero, Alexander Hermans, Narges Norouzi, Giuseppe Averta, Bastian Leibe, Gijs Dubbelman, Daan de Geus
📍 TU Eindhoven, Polytechnic of Turin, RWTH Aachen University
#ComputerVision #DeepLearning #ViT #ImageSegmentation #EoMT #CVPR2025
(6/6)
March 31, 2025 at 8:35 PM
Segmentation, simplified.
We’re excited to see what you build on top of it. 🛠️
🌐 Project: tue-mps.github.io/eomt
📝 Paper: arxiv.org/abs/2503.19108
💻 Code: github.com/tue-mps/eomt
🤗 Models: huggingface.co/tue-mps
(5/6)
Your ViT is Secretly an Image Segmentation Model (CVPR 2025)
CVPR 2025: EoMT shows ViTs can segment efficiently and effectively without adapters or decoders.
tue-mps.github.io
March 31, 2025 at 8:35 PM
Why does EoMT work?
Large ViTs pre-trained on rich visual data (like DINOv2 🦖) can learn the inductive biases needed for segmentation, with no extra components required.
✅ EoMT removes the clutter and lets the ViT do it all.
(4/6)
March 31, 2025 at 8:35 PM
How fast can segmentation get while still maintaining accuracy?
✅ EoMT achieves an optimal trade-off between accuracy (PQ) 📊 and speed (FPS) ⚡ on COCO, thanks to its simple encoder-only design.
❌ No complex additional components.
❌ No bottlenecks.
🚀 Just performance.
(3/6)
March 31, 2025 at 8:35 PM
How do modern segmentation models work?
🚫 They chain together complex components:
ViT → Adapter → Pixel Decoder → Transformer Decoder…
✅ EoMT removes them all.
It keeps only the ViT and adds a few query tokens that guide it to predict masks, no decoder needed.
(2/6)
March 31, 2025 at 8:35 PM