🚨Thrilled to share "Caption This, Reason That", a #NeurIPS2025 Spotlight! 🔦
Meet us at #2112, 3 Dec 11 a.m.
We analyze VLM limitations through the lens of Cognitive Science (Perception, Attention, Memory) and propose a simple "Self-Captioning" method that boosts spatial reasoning by ~18%.
🧵👇
🚨Thrilled to share "Caption This, Reason That", a #NeurIPS2025 Spotlight! 🔦
Meet us at #2112, 3 Dec 11 a.m.
We analyze VLM limitations through the lens of Cognitive Science (Perception, Attention, Memory) and propose a simple "Self-Captioning" method that boosts spatial reasoning by ~18%.
🧵👇