Lorenzo Mur
murloren.bsky.social
Lorenzo Mur
@murloren.bsky.social
PhD student at the University of Zaragoza in Deep Learning and Computer Vision
If you are interested, code and weights are already available!!
Paper: arxiv.org/pdf/2503.08344 📜
Code: github.com/lmur98/DIV_F... 💻
arxiv.org
March 19, 2025 at 2:15 PM
Moreover, the distillation of video-language features allows to capture detailed affordances, where previous approaches failed due to they used CLIP features
March 19, 2025 at 2:15 PM
Our model enables detailed segmentation and maintains consistent understanding over time. For instance, we can track dynamic objects along the video
March 19, 2025 at 2:15 PM
We propose to decompose the egocentric scene into persistent, dynamic, and actor-based components while integrating both image and video-language features.
March 19, 2025 at 2:15 PM
Egocentric videos are characterized by dynamic interactions and a strong dependence on the wearer engagement with the environment. Traditional approaches often focus on isolated clips or fail to integrate rich semantic and geometric information, limiting scene comprehension.
March 19, 2025 at 2:15 PM