Gabriele Goletto
gabrigole.bsky.social
Gabriele Goletto
@gabrigole.bsky.social
Research Scientist @ Microsoft.
👨‍💻 https://gabrielegoletto.github.io
Reposted by Gabriele Goletto
Now on ArXiv our
@cvprconference.bsky.social
#CVPR2025 paper
Learning from Streaming Video with Orthogonal Gradients
Instead of shuffling clips, can we learn from videos fed sequentially, where you see a clip once, in order?
How to deal with the correlation of gradients over training?
1/3
April 10, 2025 at 3:04 PM
Reposted by Gabriele Goletto
Image segmentation doesn’t have to be rocket science. 🚀
Why build a rocket engine full of bolted-on subsystems when one elegant unit does the job? 💡
That’s what we did for segmentation.
✅ Meet the Encoder-only Mask Transformer (EoMT): tue-mps.github.io/eomt (CVPR 2025)
(1/6)
March 31, 2025 at 8:35 PM
Reposted by Gabriele Goletto
Excited to release the first worldwide aerial image localization method (and demo!)
Take an aerial or satellite image from anywhere in the world, and AstroLoc can (probably) find its location, and provide a precise footprint!
Links to paper, demo and full-length (5 min) video ⬇️
February 14, 2025 at 10:32 AM
Reposted by Gabriele Goletto
🛑📢
HD-EPIC: A Highly-Detailed Egocentric Video Dataset
hd-epic.github.io
arxiv.org/abs/2502.04144
New collected videos
263 annotations/min: recipe, nutrition, actions, sounds, 3D object movement &fixture associations, masks.
26K VQA benchmark to challenge current VLMs
1/N
February 7, 2025 at 11:45 AM
Reposted by Gabriele Goletto
Now on ArXiv
ShowHowTo: Generating Scene-Conditioned Step-by-Step Visual Instructions
arxiv.org/abs/2412.01987
soczech.github.io/showhowto/
Given one real image &variable sequence of text instructions, ShowHowTo generates a multi-step sequence of images *conditioned on the scene in the REAL image*
🧵
December 5, 2024 at 3:01 PM