Lightnews — Scholar-powered news

Reposted by David Nordström

Georg Bökman

@bokmangeorg.bsky.social

Turns out NLP is just vision

Tim Kellogg @timkellogg.me · 20d

Z.ai released a paper very similar to DeepSeek-OCR on the same exact day (a few hours earlier afaict)

Glyph is just a framework, not a model, but they got Qwen3-8B (128k context) to handle over 1 million context by rendering input as images

arxiv.org/abs/2510.17800

Figure 1.

(Upper) Diagram comparing two paradigms for long-context tasks:
• Left path (Plain Text): A long novel (~180K words, example “Jane Eyre”) is fed directly as text into an LLM, requiring roughly 240K tokens.
• Right path (Rendering): The same text is rendered into images, producing about 80K tokens—achieving 3× input-token compression—and processed by a VLM (Vision-Language Model) instead of a pure text LLM.

(Lower) Two sets of bar charts:
• Left chart (Accuracy): Glyph performs comparably to Qwen-3-8B, GLM-4.9B-Chat-1M, and Qwen-2.5-7B-Instruct-1M on LongBench and MRCR tasks.
• Right chart (Compression/Speedup): Glyph shows 3.2× KV cache reduction, 4.8× prefill speedup, and 4.4× decoding throughput compared to the text backbone model.

Caption text:

Comparison of two paradigms for long-context tasks: conventional approaches directly feeding plain text into LLMs, and the proposed VLM-based paradigm, Glyph, which renders text as compact images to achieve substantial input-token compression. Glyph attains competitive performance on LongBench and MRCR while offering significant compression and inference speedup over its text backbone model on 128K-token inputs.

October 21, 2025 at 4:39 PM

David Nordström

@davnords.bsky.social

These gentlemen show how not only colmap but also VGGT fail on spherical motion in their ICCV oral paper "Uncalibrated Structure from Motion on a Sphere".

I wonder if it is just a data issue for VGGT or if it is deeper than that. I mean VGGT was trained on mostly synthetic data. What do you think?

October 22, 2025 at 12:21 AM

Reposted by David Nordström

Georg Bökman

@bokmangeorg.bsky.social

Pro tip: For good Halloween vibes, use non-normalized RoPE on images larger than your training resolution and larger than the composite period of some of the RoPE-rotations. You might get scary ghost structures in your features.

October 16, 2025 at 2:53 PM

Reposted by David Nordström

Johan Edstedt

@parskatt.bsky.social

RoMa now on PyPI under name of `romatch`

September 23, 2025 at 7:56 PM

Reposted by David Nordström

Dmytro Mishkin

@ducha-aiki.bsky.social

Towards the Next Generation of 3D Reconstruction

@parskatt.bsky.social PhD Thesis.

tl;dr: would be useful in teaching image matching - nice explanations. (too) Fancy and stylish notation. Cool Ack section and cover image.

liu.diva-portal.org/smash/record...

September 18, 2025 at 6:25 AM

Reposted by David Nordström

Johan Edstedt

@parskatt.bsky.social

And here is a link to the thesis itself: liu.diva-portal.org/smash/record...

Towards the Next Generation of 3D Reconstruction

liu.diva-portal.org

September 17, 2025 at 6:31 AM

Reposted by David Nordström

Christian Wolf

@chriswolfvision.bsky.social

How to name your method: a comprehensive flow chart

September 13, 2025 at 3:32 PM

Reposted by David Nordström

Johan Edstedt

@parskatt.bsky.social

Born too late to explore the earth.
Born too early to explore the galaxy.
Born just in time to \nabla_{\theta}f_{\theta}

July 26, 2025 at 1:38 AM

David Nordström

@davnords.bsky.social

Presented with my co-supervisor @bokmangeorg.bsky.social at #ICML25, fun time!

July 16, 2025 at 2:22 AM

Reposted by David Nordström

Georg Bökman

@bokmangeorg.bsky.social

Tomorrow at ICML [Tuesday 4:30 pm, poster W-213] @davnords.bsky.social and I will present our spotlighted flop paper. Come by and let us try to convince you that equivariant nets should be standard in vision tasks due to computational benefits! bsky.app/profile/bokm...

Poster for the paper "Flopping for FLOPs".

July 15, 2025 at 4:32 AM

David Nordström

@davnords.bsky.social

I am at ICML, happy to connect with all of you :)

July 14, 2025 at 2:03 PM

Reposted by David Nordström

Johan Edstedt

@parskatt.bsky.social

And today I'm presenting this work at #CVPR2025!
🗓️ Date: 16:00-18:00, Fri, Jun 13 (Today)
📍Place: Poster #115 in Session 2 (ExHall D)
💻 Code: github.com/ericssonrese...

Dmytro Mishkin @ducha-aiki.bsky.social · Mar 24

ColabSfM: Collaborative Structure-from-Motion by Point Cloud Registration

@parskatt.bsky.social , André Mateus, Alberto Jaenal
tl;dr: in title, learning to register SfM point clouds.
arxiv.org/abs/2503.17093

June 13, 2025 at 3:02 PM

Reposted by David Nordström

Johan Edstedt

@parskatt.bsky.social

May 25, 2025 at 10:10 AM

Reposted by David Nordström

Johan Edstedt

@parskatt.bsky.social

As models become larger, more of their compute is spent in the MLP.
Turns out that this is perfect for octic equivariance, as our biggest gains are there!

May 23, 2025 at 9:51 AM

David Nordström

@davnords.bsky.social

Want stronger Vision Transformers? Use octic-equivariant layers (arxiv.org/abs/2505.15441).

TLDR; We extend @bokmangeorg.bsky.social's reflection-equivariant ViTs to the (octic) group of 90-degree rotations and reflections and... it just works... (DINOv2+DeiT)

Code: github.com/davnords/octic-vits

May 23, 2025 at 7:38 AM

David Nordström

@davnords.bsky.social

GPT 4o's new image capabilities seem to be liked. The insinuation from OpenAI seems to be that it is not based on diffusion. I wonder how their work relates to the infamous NeurIPS paper "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction" (arxiv.org/abs/2404.02905)

March 28, 2025 at 1:32 AM

Reposted by David Nordström

Johan Edstedt

@parskatt.bsky.social

New paper!
We merge SfM reconstructions with point cloud registration.

Link: arxiv.org/abs/2503.17093
Code: Not yet public, but coming later.

March 24, 2025 at 9:49 AM

Reposted by David Nordström

Johan Edstedt

@parskatt.bsky.social

New paper! (arxiv.org/abs/2503.13433), we look into improving the threshold roubustness of Random Sample Consensus (RANSAC) through (less biased) inlier noise scale estimation.

March 18, 2025 at 4:48 AM

Reposted by David Nordström

Jianyuan Wang

@jianyuanwang.bsky.social

Introducing VGGT (CVPR'25), a feedforward Transformer that directly infers all key 3D attributes from one, a few, or hundreds of images, in seconds!

Project Page: vgg-t.github.io
Code & Weights: github.com/facebookrese...

March 17, 2025 at 2:08 AM

Reposted by David Nordström

Johan Edstedt

@parskatt.bsky.social

Introducing DaD, Part 2, a pretty cool keypoint detector.

Johan Edstedt @parskatt.bsky.social · Mar 11

Introducing DaD (arxiv.org/abs/2503.07347), a pretty cool keypoint detector.
As this will get pretty long, this will be two threads.
The first will go into the RL part, and the second on the emergence and distillation.

March 11, 2025 at 4:00 AM

Reposted by David Nordström

Johan Edstedt

@parskatt.bsky.social

We made a new keypoint detector named DaD, paper isn't up yet, but code and weights are:
github.com/Parskatt/dad

March 10, 2025 at 7:53 AM

Reposted by David Nordström

Georg Bökman

@bokmangeorg.bsky.social

Common beliefs about equivariant networks for image input include 1) They are slow. 2) They don’t scale to ImageNet. 3) They are complicated. In my opinion, these three are all false. To argue against them, we made minimal modifications to popular vision models, turning them mirror-equivariant.

February 10, 2025 at 7:35 AM

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news