Xingyu Chen
xingyu-chen.bsky.social
Xingyu Chen
@xingyu-chen.bsky.social
PhD Student at Westlake University, working on 3D & 4D Foundation Models.
https://rover-xingyu.github.io/
Instead of updating all states uniformly, we incorporate image attention as per-token learning rates.

High-confidence matches get larger updates, while low-quality updates are suppressed.

This soft gating greatly extends the length generalization beyond the training context.
October 1, 2025 at 3:26 PM
#VGGT: accurate within short clips, but slow and prone to Out-of-Memory (OOM)

#CUT3R: fast with constant memory usage, but forgets.

We revisit them from a Test-Time Training (TTT) perspective and propose #TTT3R to get all three: fast, accurate, and OOM-free.
October 1, 2025 at 3:24 PM
With our estimated segmentation masks, we perform a second inference pass by re-weighting the attention, enabling robust 4D reconstruction and even outperforming SOTA methods trained on 4D datasets, with almost no extra cost compared to vanilla DUSt3R.
April 1, 2025 at 3:25 PM
We propose an attention-guided strategy to decompose dynamic objects from the static background, enabling robust dynamic object segmentation. It outperforms the optical-flow guided segmentation, like MonST3R, and the model trained on dynamic mask labels, like DAS3R.
April 1, 2025 at 3:24 PM
💡Humans naturally separate ego-motion from object-motion without dynamic labels. We observe that #DUSt3R has implicitly learned a similar mechanism, reflected in its attention layers.
April 1, 2025 at 3:23 PM
🦣Easi3R: 4D Reconstruction Without Training!

Limited 4D datasets? Take it easy.

#Easi3R adapts #DUSt3R for 4D reconstruction by disentangling and repurposing its attention maps → make 4D reconstruction easier than ever!

🔗Page: easi3r.github.io
April 1, 2025 at 3:21 PM