David Nordström
davnords.bsky.social
David Nordström
@davnords.bsky.social
Phd Student @ Chalmers

Deep Learning for Computer Vision.
Strengthen your ViTs: https://github.com/davnords/octic-vits
Reposted by David Nordström
Turns out NLP is just vision
Z.ai released a paper very similar to DeepSeek-OCR on the same exact day (a few hours earlier afaict)

Glyph is just a framework, not a model, but they got Qwen3-8B (128k context) to handle over 1 million context by rendering input as images

arxiv.org/abs/2510.17800
October 21, 2025 at 4:39 PM
These gentlemen show how not only colmap but also VGGT fail on spherical motion in their ICCV oral paper "Uncalibrated Structure from Motion on a Sphere".

I wonder if it is just a data issue for VGGT or if it is deeper than that. I mean VGGT was trained on mostly synthetic data. What do you think?
October 22, 2025 at 12:21 AM
Reposted by David Nordström
Pro tip: For good Halloween vibes, use non-normalized RoPE on images larger than your training resolution and larger than the composite period of some of the RoPE-rotations. You might get scary ghost structures in your features.
October 16, 2025 at 2:53 PM
Reposted by David Nordström
RoMa now on PyPI under name of `romatch`
September 23, 2025 at 7:56 PM
Reposted by David Nordström
Towards the Next Generation of 3D Reconstruction

@parskatt.bsky.social PhD Thesis.

tl;dr: would be useful in teaching image matching - nice explanations. (too) Fancy and stylish notation. Cool Ack section and cover image.

liu.diva-portal.org/smash/record...
September 18, 2025 at 6:25 AM
Reposted by David Nordström
And here is a link to the thesis itself: liu.diva-portal.org/smash/record...
Towards the Next Generation of 3D Reconstruction
liu.diva-portal.org
September 17, 2025 at 6:31 AM
Reposted by David Nordström
How to name your method: a comprehensive flow chart
September 13, 2025 at 3:32 PM
Reposted by David Nordström
Born too late to explore the earth.
Born too early to explore the galaxy.
Born just in time to \nabla_{\theta}f_{\theta}
July 26, 2025 at 1:38 AM
Presented with my co-supervisor @bokmangeorg.bsky.social at #ICML25, fun time!
July 16, 2025 at 2:22 AM
Reposted by David Nordström
Tomorrow at ICML [Tuesday 4:30 pm, poster W-213] @davnords.bsky.social and I will present our spotlighted flop paper. Come by and let us try to convince you that equivariant nets should be standard in vision tasks due to computational benefits! bsky.app/profile/bokm...
July 15, 2025 at 4:32 AM
I am at ICML, happy to connect with all of you :)
July 14, 2025 at 2:03 PM
Reposted by David Nordström
And today I'm presenting this work at #CVPR2025!
🗓️ Date: 16:00-18:00, Fri, Jun 13 (Today)
📍Place: Poster #115 in Session 2 (ExHall D)
💻 Code: github.com/ericssonrese...
ColabSfM: Collaborative Structure-from-Motion by Point Cloud Registration

@parskatt.bsky.social , André Mateus, Alberto Jaenal
tl;dr: in title, learning to register SfM point clouds.
arxiv.org/abs/2503.17093
June 13, 2025 at 3:02 PM
Reposted by David Nordström
May 25, 2025 at 10:10 AM
Reposted by David Nordström
As models become larger, more of their compute is spent in the MLP.
Turns out that this is perfect for octic equivariance, as our biggest gains are there!
May 23, 2025 at 9:51 AM
Want stronger Vision Transformers? Use octic-equivariant layers (arxiv.org/abs/2505.15441).

TLDR; We extend @bokmangeorg.bsky.social's reflection-equivariant ViTs to the (octic) group of 90-degree rotations and reflections and... it just works... (DINOv2+DeiT)

Code: github.com/davnords/octic-vits
May 23, 2025 at 7:38 AM
GPT 4o's new image capabilities seem to be liked. The insinuation from OpenAI seems to be that it is not based on diffusion. I wonder how their work relates to the infamous NeurIPS paper "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction" (arxiv.org/abs/2404.02905)
March 28, 2025 at 1:32 AM
Reposted by David Nordström
New paper!
We merge SfM reconstructions with point cloud registration.

Link: arxiv.org/abs/2503.17093
Code: Not yet public, but coming later.
March 24, 2025 at 9:49 AM
Reposted by David Nordström
New paper! (arxiv.org/abs/2503.13433), we look into improving the threshold roubustness of Random Sample Consensus (RANSAC) through (less biased) inlier noise scale estimation.
March 18, 2025 at 4:48 AM
Reposted by David Nordström
Introducing VGGT (CVPR'25), a feedforward Transformer that directly infers all key 3D attributes from one, a few, or hundreds of images, in seconds!

Project Page: vgg-t.github.io
Code & Weights: github.com/facebookrese...
March 17, 2025 at 2:08 AM
Reposted by David Nordström
Introducing DaD, Part 2, a pretty cool keypoint detector.
Introducing DaD (arxiv.org/abs/2503.07347), a pretty cool keypoint detector.
As this will get pretty long, this will be two threads.
The first will go into the RL part, and the second on the emergence and distillation.
March 11, 2025 at 4:00 AM
Reposted by David Nordström
We made a new keypoint detector named DaD, paper isn't up yet, but code and weights are:
github.com/Parskatt/dad
March 10, 2025 at 7:53 AM
Reposted by David Nordström
Common beliefs about equivariant networks for image input include 1) They are slow. 2) They don’t scale to ImageNet. 3) They are complicated. In my opinion, these three are all false. To argue against them, we made minimal modifications to popular vision models, turning them mirror-equivariant.
February 10, 2025 at 7:35 AM