Jiafei Duan
djiafei.bsky.social
Jiafei Duan
@djiafei.bsky.social
Robotics PhD student @uwcse|Graduate Student Researcher @allen_ai |Ex-@NVIDIA |@ASTARsg scholars|BEng from @ntueee. Research in robot learning and embodied AI

www.duanjiafei.com
8/🧵Challenges MLMs face:
Even strong models perform near-randomly on SAT's dynamic tasks.
Egocentric movement and multiview reasoning remain tough nuts to crack.
December 11, 2024 at 4:12 PM
6/🧵How does SAT generate data?
Uses ProcTHOR for 3D scenes.
Procedurally generates static & dynamic QAs.
Scalable, cost-effective, & adaptable for new tasks. 🏠
December 11, 2024 at 4:12 PM
5/🧵Here's the kicker: Fine-tuning on SAT makes the open-source LLaVA-13B model match or surpass proprietary giants like GPT4-V in spatial reasoning! 🎯
December 11, 2024 at 4:12 PM
4/🧵 Results? SAT improves performance not only on its own dataset but also boosts zero-shot spatial reasoning:
+23% on CVBench
+9% on BLINK (harder benchmarks)
+18% on Visual Spatial Relations (VSR) dataset. 💪
December 11, 2024 at 4:12 PM
3/🧵Example tasks SAT tackles:
Static: Is object X to the left of object Y?
Dynamic: How did the camera move between frames? Did the object get closer or further?
Perspective: What does object placement look like from point X?
December 11, 2024 at 4:12 PM
2/🧵SAT introduces 218K question-answer pairs for 22K synthetic scenes created using a photorealistic physics engine. It goes beyond static benchmarks to tackle dynamic reasoning tasks like egocentric actions, object movement, & perspective-taking. 🔍
December 11, 2024 at 4:12 PM
🚀Excited to introduce our latest work- SAT: Spatial Aptitude Training, a groundbreaking approach to enhance spatial reasoning in Multimodal Language Models (MLMs). SAT isn't just about understanding static object positions but dives deep into dynamic spatial reasoning. 🧵👇
December 11, 2024 at 4:12 PM
A scene from maniskill,
Prompt: Move the mobile robot to the table and place the red bowl onto the table.
December 10, 2024 at 6:51 PM
I am impressed by Sora, and seeing potential for using it in robotics.
December 10, 2024 at 5:40 AM