Paper: “A Deep Learning Model of Mental Rotation Informed by Interactive VR Experiments” (arxiv.org/abs/2512.13517)
Thanks for reading this thread!
Paper: “A Deep Learning Model of Mental Rotation Informed by Interactive VR Experiments” (arxiv.org/abs/2512.13517)
Thanks for reading this thread!
While questions remain (see Discussion), in this work we present a model that provides a mechanistic account of mental rotation and highlights how deep, equivariant, and symbolic representations can support spatial reasoning in artificial systems.
While questions remain (see Discussion), in this work we present a model that provides a mechanistic account of mental rotation and highlights how deep, equivariant, and symbolic representations can support spatial reasoning in artificial systems.
(Systematic ablations demonstrate the necessity of each module.)
(Systematic ablations demonstrate the necessity of each module.)
This brings us to the third module: an MLP that sequentially predicts, from pairs of symbolic codes, similarity decisions and rotation actions to apply to the 3D latent space.
This brings us to the third module: an MLP that sequentially predicts, from pairs of symbolic codes, similarity decisions and rotation actions to apply to the 3D latent space.
Each object has a unique code per quadrant; mental rotation reduces to switching quadrants until alignment.
Each object has a unique code per quadrant; mental rotation reduces to switching quadrants until alignment.
We ran interactive VR experiments where participants could rotate objects using a thumbstick.
We found that participants typically take a single action to roughly align the objects before judging similarity.
We ran interactive VR experiments where participants could rotate objects using a thumbstick.
We found that participants typically take a single action to roughly align the objects before judging similarity.
The autoencoder extracts a 3D-structured latent representation from a 2D view of an object, and novel views can be synthesized by rotating the latent space.
The autoencoder extracts a 3D-structured latent representation from a 2D view of an object, and novel views can be synthesized by rotating the latent space.
Each module handles a specific step, and together, they sequentially solve the task and account for the underlying process of mental rotation.
Each module handles a specific step, and together, they sequentially solve the task and account for the underlying process of mental rotation.
Reaction times grew with angular differences, even for depth rotations, suggesting humans can mentally infer and manipulate 3D representations
Reaction times grew with angular differences, even for depth rotations, suggesting humans can mentally infer and manipulate 3D representations