Lightnews — Scholar-powered news

David Nordström

@davnords.bsky.social

56 followers 87 following 98 posts

Phd Student @ Chalmers

Deep Learning for Computer Vision.
Strengthen your ViTs: https://github.com/davnords/octic-vits

Posts Replies Media Videos

David Nordström

@davnords.bsky.social

November 13, 2025 at 8:38 AM

David Nordström

@davnords.bsky.social

October 23, 2025 at 5:35 AM

David Nordström

@davnords.bsky.social

These gentlemen show how not only colmap but also VGGT fail on spherical motion in their ICCV oral paper "Uncalibrated Structure from Motion on a Sphere".

I wonder if it is just a data issue for VGGT or if it is deeper than that. I mean VGGT was trained on mostly synthetic data. What do you think?

October 22, 2025 at 12:21 AM

David Nordström

@davnords.bsky.social

Obligatory picture with the poster :)

July 16, 2025 at 2:35 AM

David Nordström

@davnords.bsky.social

Presented with my co-supervisor @bokmangeorg.bsky.social at #ICML25, fun time!

July 16, 2025 at 2:22 AM

David Nordström

@davnords.bsky.social

Perhaps you'd be interested in our work on an adjacent topic, instead of optimizing the shape of ViTs we propose a sparse structure of the linear layers induced by the inductive bias of octic equivariance, significantly reducing FLOPs. Shameless self-plug :) (arxiv.org/abs/2505.15441)

June 1, 2025 at 8:39 PM

David Nordström

@davnords.bsky.social

Btw @bokmangeorg.bsky.social made some illuminating visualizations to show how our PatchEmbed leads to features that transform predictably under octic actions.

May 23, 2025 at 8:07 AM

David Nordström

@davnords.bsky.social

In particular, we achieve c. 40% reduction in FLOPs and increased throughput while improving performance for both supervised and unsupervised vision tasks.

We do so by constructing ViTs that use these computational savings early in the network and use regular blocks in the latter half.

May 23, 2025 at 8:04 AM

David Nordström

@davnords.bsky.social

We run a kernel constrained PatchEmbed to map an image to steerable ViT features and make use of the block-diagonalization of linear layers induced by the isotypical decomposition to achieve computational gains.

May 23, 2025 at 8:02 AM

David Nordström

@davnords.bsky.social

Want stronger Vision Transformers? Use octic-equivariant layers (arxiv.org/abs/2505.15441).

TLDR; We extend @bokmangeorg.bsky.social's reflection-equivariant ViTs to the (octic) group of 90-degree rotations and reflections and... it just works... (DINOv2+DeiT)

Code: github.com/davnords/octic-vits

May 23, 2025 at 7:38 AM

David Nordström

@davnords.bsky.social

March 28, 2025 at 1:37 AM

David Nordström

@davnords.bsky.social

GPT 4o's new image capabilities seem to be liked. The insinuation from OpenAI seems to be that it is not based on diffusion. I wonder how their work relates to the infamous NeurIPS paper "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction" (arxiv.org/abs/2404.02905)

March 28, 2025 at 1:32 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news