1️⃣ Spot The Ball task with human baselines
2️⃣ Large dataset including soccer, volleyball, and basketball images
3️⃣ Scalable image-generation pipeline for any sport with a ball
1️⃣ Spot The Ball task with human baselines
2️⃣ Large dataset including soccer, volleyball, and basketball images
3️⃣ Scalable image-generation pipeline for any sport with a ball
We built the Spot The Ball benchmark to test visual social inference – the ability to infer missing information from others’ behavior – in Vision Language Models.
Try the task yourself here: nehabalamurugan.com/spot-the-bal...
We built the Spot The Ball benchmark to test visual social inference – the ability to infer missing information from others’ behavior – in Vision Language Models.
Try the task yourself here: nehabalamurugan.com/spot-the-bal...
1️⃣ Spot The Ball task with human baselines
2️⃣ Large dataset including soccer, basketball, and volleyball images
3️⃣ Scalable image-generation pipeline for any sport with a ball
1️⃣ Spot The Ball task with human baselines
2️⃣ Large dataset including soccer, basketball, and volleyball images
3️⃣ Scalable image-generation pipeline for any sport with a ball
I'll be presenting this work at #CogSci2025:
📍 Poster Number P1-B-8
🗓️ Poster Session: Poster Session 1
🧠 Poster title: “Spot the Ball: Evaluating Visual Causal Inference in VLMs under Occlusion”
I'll be presenting this work at #CogSci2025:
📍 Poster Number P1-B-8
🗓️ Poster Session: Poster Session 1
🧠 Poster title: “Spot the Ball: Evaluating Visual Causal Inference in VLMs under Occlusion”
✅ An inpainting-based image generation pipeline
✅ A public demo where you can test your visual inference skills
✅ A dataset of 3000+ labeled soccer images for future work
✅ An inpainting-based image generation pipeline
✅ A public demo where you can test your visual inference skills
✅ A dataset of 3000+ labeled soccer images for future work
Humans outperform all models—even with chain-of-thought scaffolding.
GPT-4o gets closer with explicit pose/gaze cues, but still falls short in many cases.
Humans outperform all models—even with chain-of-thought scaffolding.
GPT-4o gets closer with explicit pose/gaze cues, but still falls short in many cases.
We benchmark humans and models (GPT-4o, Gemini, LLaMA, Qwen) on soccer, basketball, and volleyball.
We benchmark humans and models (GPT-4o, Gemini, LLaMA, Qwen) on soccer, basketball, and volleyball.
We isolate this in a simple but rich task: spot the masked ball from a single frame.
We isolate this in a simple but rich task: spot the masked ball from a single frame.
🗓️ It began in the UK in the 1970s as a popular newspaper contest
👥 At its peak, over 3 million people played weekly
Players had to guess where the ball had been removed from a photo—just like our benchmark does today.
🗓️ It began in the UK in the 1970s as a popular newspaper contest
👥 At its peak, over 3 million people played weekly
Players had to guess where the ball had been removed from a photo—just like our benchmark does today.
We ask: Can people and models locate a hidden ball in sports images using only visual context and reasoning?
🕹️ Try the task: v0-new-project-9b5vt6k9ugb.vercel.app
#CogSci2025
We ask: Can people and models locate a hidden ball in sports images using only visual context and reasoning?
🕹️ Try the task: v0-new-project-9b5vt6k9ugb.vercel.app
#CogSci2025