paulgavrikov.github.io
paulgavrikov.github.io/visualoverlo...
paulgavrikov.github.io/visualoverlo...
• Try your model: huggingface.co/spaces/paulg...
• Dataset: huggingface.co/datasets/pau...
• Code: github.com/paulgavrikov...
• Try your model: huggingface.co/spaces/paulg...
• Dataset: huggingface.co/datasets/pau...
• Code: github.com/paulgavrikov...
Even top VLMs break down on “easy” tasks in overloaded scenes.
Best model (o3):
• 19.8% accuracy (hardest split)
• 69.5% overall
Even top VLMs break down on “easy” tasks in overloaded scenes.
Best model (o3):
• 19.8% accuracy (hardest split)
• 69.5% overall
• 2,720 Q–A pairs
• 6 vision tasks
• 150 fresh, high-res, royalty-free artworks
• Privately held ground-truth responses
• 2,720 Q–A pairs
• 6 vision tasks
• 150 fresh, high-res, royalty-free artworks
• Privately held ground-truth responses