Lightnews — Scholar-powered news

Jiafei Duan

@djiafei.bsky.social

160 followers 330 following 19 posts

Robotics PhD student @uwcse|Graduate Student Researcher @allen_ai |Ex-@NVIDIA |@ASTARsg scholars|BEng from @ntueee. Research in robot learning and embodied AI

www.duanjiafei.com

Posts Replies Media Videos

Jiafei Duan

@djiafei.bsky.social

8/🧵Challenges MLMs face:
Even strong models perform near-randomly on SAT's dynamic tasks.
Egocentric movement and multiview reasoning remain tough nuts to crack.

December 11, 2024 at 4:12 PM

Jiafei Duan

@djiafei.bsky.social

6/🧵How does SAT generate data?
Uses ProcTHOR for 3D scenes.
Procedurally generates static & dynamic QAs.
Scalable, cost-effective, & adaptable for new tasks. 🏠

December 11, 2024 at 4:12 PM

Jiafei Duan

@djiafei.bsky.social

5/🧵Here's the kicker: Fine-tuning on SAT makes the open-source LLaVA-13B model match or surpass proprietary giants like GPT4-V in spatial reasoning! 🎯

December 11, 2024 at 4:12 PM

Jiafei Duan

@djiafei.bsky.social

4/🧵 Results? SAT improves performance not only on its own dataset but also boosts zero-shot spatial reasoning:
+23% on CVBench
+9% on BLINK (harder benchmarks)
+18% on Visual Spatial Relations (VSR) dataset. 💪

December 11, 2024 at 4:12 PM

Jiafei Duan

@djiafei.bsky.social

3/🧵Example tasks SAT tackles:
Static: Is object X to the left of object Y?
Dynamic: How did the camera move between frames? Did the object get closer or further?
Perspective: What does object placement look like from point X?

December 11, 2024 at 4:12 PM

Jiafei Duan

@djiafei.bsky.social

2/🧵SAT introduces 218K question-answer pairs for 22K synthetic scenes created using a photorealistic physics engine. It goes beyond static benchmarks to tackle dynamic reasoning tasks like egocentric actions, object movement, & perspective-taking. 🔍

December 11, 2024 at 4:12 PM

Jiafei Duan

@djiafei.bsky.social

🚀Excited to introduce our latest work- SAT: Spatial Aptitude Training, a groundbreaking approach to enhance spatial reasoning in Multimodal Language Models (MLMs). SAT isn't just about understanding static object positions but dives deep into dynamic spatial reasoning. 🧵👇

December 11, 2024 at 4:12 PM

Jiafei Duan

@djiafei.bsky.social

A scene from maniskill,
Prompt: Move the mobile robot to the table and place the red bowl onto the table.

December 10, 2024 at 6:51 PM

Jiafei Duan

@djiafei.bsky.social

I am impressed by Sora, and seeing potential for using it in robotics.

December 10, 2024 at 5:40 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news