🧩 Website : nehabalamurugan.com/spot-the-bal...
📊 Dataset: huggingface.co/datasets/neh...
📄 Preprint: arxiv.org/abs/2511.00261
🧩 Website : nehabalamurugan.com/spot-the-bal...
📊 Dataset: huggingface.co/datasets/neh...
📄 Preprint: arxiv.org/abs/2511.00261
1️⃣ Spot The Ball task with human baselines
2️⃣ Large dataset including soccer, volleyball, and basketball images
3️⃣ Scalable image-generation pipeline for any sport with a ball
1️⃣ Spot The Ball task with human baselines
2️⃣ Large dataset including soccer, volleyball, and basketball images
3️⃣ Scalable image-generation pipeline for any sport with a ball
📊 Dataset: huggingface.co/datasets/neh...
📄 Preprint: arxiv.org/abs/2511.00261
📊 Dataset: huggingface.co/datasets/neh...
📄 Preprint: arxiv.org/abs/2511.00261
1️⃣ Spot The Ball task with human baselines
2️⃣ Large dataset including soccer, basketball, and volleyball images
3️⃣ Scalable image-generation pipeline for any sport with a ball
1️⃣ Spot The Ball task with human baselines
2️⃣ Large dataset including soccer, basketball, and volleyball images
3️⃣ Scalable image-generation pipeline for any sport with a ball
I'll be presenting this work at #CogSci2025:
📍 Poster Number P1-B-8
🗓️ Poster Session: Poster Session 1
🧠 Poster title: “Spot the Ball: Evaluating Visual Causal Inference in VLMs under Occlusion”
I'll be presenting this work at #CogSci2025:
📍 Poster Number P1-B-8
🗓️ Poster Session: Poster Session 1
🧠 Poster title: “Spot the Ball: Evaluating Visual Causal Inference in VLMs under Occlusion”
✅ An inpainting-based image generation pipeline
✅ A public demo where you can test your visual inference skills
✅ A dataset of 3000+ labeled soccer images for future work
✅ An inpainting-based image generation pipeline
✅ A public demo where you can test your visual inference skills
✅ A dataset of 3000+ labeled soccer images for future work
Humans outperform all models—even with chain-of-thought scaffolding.
GPT-4o gets closer with explicit pose/gaze cues, but still falls short in many cases.
Humans outperform all models—even with chain-of-thought scaffolding.
GPT-4o gets closer with explicit pose/gaze cues, but still falls short in many cases.
🔹 Basic: “Which grid cell contains the ball?”
🔹 Implicit: Encourages attention to pose/gaze
🔹 Chain-of-thought: Step-by-step inference
🔹 Basic: “Which grid cell contains the ball?”
🔹 Implicit: Encourages attention to pose/gaze
🔹 Chain-of-thought: Step-by-step inference
We benchmark humans and models (GPT-4o, Gemini, LLaMA, Qwen) on soccer, basketball, and volleyball.
We benchmark humans and models (GPT-4o, Gemini, LLaMA, Qwen) on soccer, basketball, and volleyball.
We isolate this in a simple but rich task: spot the masked ball from a single frame.
We isolate this in a simple but rich task: spot the masked ball from a single frame.
🗓️ It began in the UK in the 1970s as a popular newspaper contest
👥 At its peak, over 3 million people played weekly
Players had to guess where the ball had been removed from a photo—just like our benchmark does today.
🗓️ It began in the UK in the 1970s as a popular newspaper contest
👥 At its peak, over 3 million people played weekly
Players had to guess where the ball had been removed from a photo—just like our benchmark does today.