David Nordström
davnords.bsky.social
David Nordström
@davnords.bsky.social
Phd Student @ Chalmers

Deep Learning for Computer Vision.
Strengthen your ViTs: https://github.com/davnords/octic-vits
November 13, 2025 at 8:38 AM
October 23, 2025 at 5:35 AM
These gentlemen show how not only colmap but also VGGT fail on spherical motion in their ICCV oral paper "Uncalibrated Structure from Motion on a Sphere".

I wonder if it is just a data issue for VGGT or if it is deeper than that. I mean VGGT was trained on mostly synthetic data. What do you think?
October 22, 2025 at 12:21 AM
Obligatory picture with the poster :)
July 16, 2025 at 2:35 AM
Presented with my co-supervisor @bokmangeorg.bsky.social at #ICML25, fun time!
July 16, 2025 at 2:22 AM
Perhaps you'd be interested in our work on an adjacent topic, instead of optimizing the shape of ViTs we propose a sparse structure of the linear layers induced by the inductive bias of octic equivariance, significantly reducing FLOPs. Shameless self-plug :) (arxiv.org/abs/2505.15441)
June 1, 2025 at 8:39 PM
Btw @bokmangeorg.bsky.social made some illuminating visualizations to show how our PatchEmbed leads to features that transform predictably under octic actions.
May 23, 2025 at 8:07 AM
In particular, we achieve c. 40% reduction in FLOPs and increased throughput while improving performance for both supervised and unsupervised vision tasks.

We do so by constructing ViTs that use these computational savings early in the network and use regular blocks in the latter half.
May 23, 2025 at 8:04 AM
We run a kernel constrained PatchEmbed to map an image to steerable ViT features and make use of the block-diagonalization of linear layers induced by the isotypical decomposition to achieve computational gains.
May 23, 2025 at 8:02 AM
Want stronger Vision Transformers? Use octic-equivariant layers (arxiv.org/abs/2505.15441).

TLDR; We extend @bokmangeorg.bsky.social's reflection-equivariant ViTs to the (octic) group of 90-degree rotations and reflections and... it just works... (DINOv2+DeiT)

Code: github.com/davnords/octic-vits
May 23, 2025 at 7:38 AM
March 28, 2025 at 1:37 AM
GPT 4o's new image capabilities seem to be liked. The insinuation from OpenAI seems to be that it is not based on diffusion. I wonder how their work relates to the infamous NeurIPS paper "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction" (arxiv.org/abs/2404.02905)
March 28, 2025 at 1:32 AM