Deep Learning for Computer Vision.
Strengthen your ViTs: https://github.com/davnords/octic-vits
I wonder if it is just a data issue for VGGT or if it is deeper than that. I mean VGGT was trained on mostly synthetic data. What do you think?
I wonder if it is just a data issue for VGGT or if it is deeper than that. I mean VGGT was trained on mostly synthetic data. What do you think?
We do so by constructing ViTs that use these computational savings early in the network and use regular blocks in the latter half.
We do so by constructing ViTs that use these computational savings early in the network and use regular blocks in the latter half.
TLDR; We extend @bokmangeorg.bsky.social's reflection-equivariant ViTs to the (octic) group of 90-degree rotations and reflections and... it just works... (DINOv2+DeiT)
Code: github.com/davnords/octic-vits
TLDR; We extend @bokmangeorg.bsky.social's reflection-equivariant ViTs to the (octic) group of 90-degree rotations and reflections and... it just works... (DINOv2+DeiT)
Code: github.com/davnords/octic-vits