✅ EoMT achieves an optimal trade-off between accuracy (PQ) 📊 and speed (FPS) ⚡ on COCO, thanks to its simple encoder-only design.
❌ No complex additional components.
❌ No bottlenecks.
🚀 Just performance.
(3/6)
✅ EoMT achieves an optimal trade-off between accuracy (PQ) 📊 and speed (FPS) ⚡ on COCO, thanks to its simple encoder-only design.
❌ No complex additional components.
❌ No bottlenecks.
🚀 Just performance.
(3/6)
🚫 They chain together complex components:
ViT → Adapter → Pixel Decoder → Transformer Decoder…
✅ EoMT removes them all.
It keeps only the ViT and adds a few query tokens that guide it to predict masks, no decoder needed.
(2/6)
🚫 They chain together complex components:
ViT → Adapter → Pixel Decoder → Transformer Decoder…
✅ EoMT removes them all.
It keeps only the ViT and adds a few query tokens that guide it to predict masks, no decoder needed.
(2/6)
Why build a rocket engine full of bolted-on subsystems when one elegant unit does the job? 💡
That’s what we did for segmentation.
✅ Meet the Encoder-only Mask Transformer (EoMT): tue-mps.github.io/eomt (CVPR 2025)
(1/6)
Why build a rocket engine full of bolted-on subsystems when one elegant unit does the job? 💡
That’s what we did for segmentation.
✅ Meet the Encoder-only Mask Transformer (EoMT): tue-mps.github.io/eomt (CVPR 2025)
(1/6)