anjaliwgupta.bsky.social
@anjaliwgupta.bsky.social
Prompting for “cognitive maps,” a concept introduced by Edward Tolman in the ‘40s for the unified representation of spatial environments brains build, we find MLLMs have a local spatial bias and that explicitly remembering spaces improves relational distance abilities. [6/n]
December 23, 2024 at 10:48 PM
What does it mean to “think in space”? We analyze spatial intelligence linguistically and visually.

We analyze self-explanations to attribute VSI-Bench performance to visual-spatial capabilities and find that spatial and linguistic intelligence are very distinct. [5/n]
December 23, 2024 at 10:47 PM
VSI-Bench tests configuration, measurement estimation, and spatiotemporal abilities across 5k+ Video QA pairs and eight task types.

We evaluate VSI-Bench on open- and closed-source MLLMs and find that MLLMs exhibit competitive—though subhuman—visual-spatial intelligence. [4/n]
December 23, 2024 at 10:47 PM