Xingyu Chen
xingyu-chen.bsky.social
Xingyu Chen
@xingyu-chen.bsky.social
PhD Student at Westlake University, working on 3D & 4D Foundation Models.
https://rover-xingyu.github.io/
Instead of updating all states uniformly, we incorporate image attention as per-token learning rates.

High-confidence matches get larger updates, while low-quality updates are suppressed.

This soft gating greatly extends the length generalization beyond the training context.
October 1, 2025 at 3:26 PM
DUSt3R was never trained to do dynamic segmentation with GT masks, right? It was just trained to regress point maps on 3D datasets—yet dynamic awareness emerged, making DUSt3R a zero-shot 4D estimator!😀
April 2, 2025 at 7:59 AM
With our estimated segmentation masks, we perform a second inference pass by re-weighting the attention, enabling robust 4D reconstruction and even outperforming SOTA methods trained on 4D datasets, with almost no extra cost compared to vanilla DUSt3R.
April 1, 2025 at 3:25 PM
We propose an attention-guided strategy to decompose dynamic objects from the static background, enabling robust dynamic object segmentation. It outperforms the optical-flow guided segmentation, like MonST3R, and the model trained on dynamic mask labels, like DAS3R.
April 1, 2025 at 3:24 PM
💡Humans naturally separate ego-motion from object-motion without dynamic labels. We observe that #DUSt3R has implicitly learned a similar mechanism, reflected in its attention layers.
April 1, 2025 at 3:23 PM