www.duanjiafei.com
Mixing static & dynamic training data results in significant accuracy gains across all tasks. 📊
Mixing static & dynamic training data results in significant accuracy gains across all tasks. 📊
Even strong models perform near-randomly on SAT's dynamic tasks.
Egocentric movement and multiview reasoning remain tough nuts to crack.
Even strong models perform near-randomly on SAT's dynamic tasks.
Egocentric movement and multiview reasoning remain tough nuts to crack.
Egocentric Movement
Object Movement
Allocentric Perspective
Goal Aiming
Action Consequence
Each task tests unique dimensions of spatial cognition.🧠
Egocentric Movement
Object Movement
Allocentric Perspective
Goal Aiming
Action Consequence
Each task tests unique dimensions of spatial cognition.🧠
Uses ProcTHOR for 3D scenes.
Procedurally generates static & dynamic QAs.
Scalable, cost-effective, & adaptable for new tasks. 🏠
Uses ProcTHOR for 3D scenes.
Procedurally generates static & dynamic QAs.
Scalable, cost-effective, & adaptable for new tasks. 🏠
+23% on CVBench
+9% on BLINK (harder benchmarks)
+18% on Visual Spatial Relations (VSR) dataset. 💪
+23% on CVBench
+9% on BLINK (harder benchmarks)
+18% on Visual Spatial Relations (VSR) dataset. 💪
Static: Is object X to the left of object Y?
Dynamic: How did the camera move between frames? Did the object get closer or further?
Perspective: What does object placement look like from point X?
Static: Is object X to the left of object Y?
Dynamic: How did the camera move between frames? Did the object get closer or further?
Perspective: What does object placement look like from point X?
Prompt: Move the mobile robot to the table and place the red bowl onto the table.
Prompt: Move the mobile robot to the table and place the red bowl onto the table.