Lightnews — Scholar-powered news

Paul Gavrikov

@paulgavrikov.bsky.social

Agree 100%! I think this paper does a great job of outlining issues in the original paper.

October 8, 2025 at 7:43 PM

Paul Gavrikov

@paulgavrikov.bsky.social

If you think of texture as the material/surface property (which I think is the original perspective), then the ablation in this paper is insufficient to suppress the cue.

October 8, 2025 at 4:43 PM

Paul Gavrikov

@paulgavrikov.bsky.social

I really liked the thoroughness of this paper but I’m afraid that the results are building on a shaky definition of „texture“. If you replace texture in the original paper with local details it’s virtually the same finding.

October 8, 2025 at 4:43 PM

Paul Gavrikov

@paulgavrikov.bsky.social

See more details and try out your own model here:
paulgavrikov.github.io/visualoverlo...

VisualOverload: Probing Visual Understanding of VLMs in Really Dense Scenes

The paper introduces VisualOverload, a new visual question answering (VQA) benchmark designed to test vision-language models (VLMs) on densely populated, detail-rich scenes using public-domain paintin...

paulgavrikov.github.io

October 1, 2025 at 1:17 PM

Paul Gavrikov

@paulgavrikov.bsky.social

4) Models answer consistently for easy questions ("Is it day?": yes, "Is it night?": no) but fall back to guessing for hard tasks such as reasoning. Concerningly, some models even fall below random chance, hinting at shortcuts.

October 1, 2025 at 1:17 PM

Paul Gavrikov

@paulgavrikov.bsky.social

3) Similar trends for OCR. Our OCR questions contain constraints (e.g., the fifth word) that models often fail to consider. Minor errors include a strong tendency to autocorrect typos or to hallucinate more common spellings, especially in non-Latin/English.

October 1, 2025 at 1:17 PM

Paul Gavrikov

@paulgavrikov.bsky.social

2) Models cannot count in dense scenes, and the performance gets worse the larger the number of objects; they typically "undercount" and errors are massive. Here is the distribution over all models:

October 1, 2025 at 1:17 PM

Paul Gavrikov

@paulgavrikov.bsky.social

1) Our benchmark is hard: the best model (o3) achieves an accuracy of 69.5% in total, but only 19.6% on the hardest split. We observe significant performance drops on some tasks.

October 1, 2025 at 1:17 PM

Paul Gavrikov

@paulgavrikov.bsky.social

Our questions are built on top of a fresh dataset of 150 high-resolution and detailed scenes probing core vision skills in 6 categories: counting, OCR, reasoning, activity/attribute/global scene recognition. The ground truth is private, and our eval server is live!

October 1, 2025 at 1:17 PM

Paul Gavrikov

@paulgavrikov.bsky.social

Paper out now: arxiv.org/abs/2509.25339

VisualOverload: Probing Visual Understanding of VLMs in Really Dense Scenes

Is basic visual understanding really solved in state-of-the-art VLMs? We present VisualOverload, a slightly different visual question answering (VQA) benchmark comprising 2,720 question-answer pairs, ...

arxiv.org

October 1, 2025 at 12:45 PM

Paul Gavrikov

@paulgavrikov.bsky.social

Joint work with Wei Lin, Jehanzeb Mirza, Soumya Jahagirdar, Muhammad Huzaifa, Sivan Doveh, James Glass, and Hilde Kuehne.

September 8, 2025 at 3:28 PM

Paul Gavrikov

@paulgavrikov.bsky.social

Paper coming soon! In the meantime:
• Try your model: huggingface.co/spaces/paulg...
• Dataset: huggingface.co/datasets/pau...
• Code: github.com/paulgavrikov...

September 8, 2025 at 3:28 PM

Paul Gavrikov

@paulgavrikov.bsky.social

🤖 We tested 37 models. Results?
Even top VLMs break down on “easy” tasks in overloaded scenes.

Best model (o3):
• 19.8% accuracy (hardest split)
• 69.5% overall

September 8, 2025 at 3:28 PM

Paul Gavrikov

@paulgavrikov.bsky.social

📊 VisualOverload =
• 2,720 Q–A pairs
• 6 vision tasks
• 150 fresh, high-res, royalty-free artworks
• Privately held ground-truth responses

September 8, 2025 at 3:28 PM

Paul Gavrikov

@paulgavrikov.bsky.social

Paper: arxiv.org/abs/2403.09193
Code: github.com/paulgavrikov...

May 3, 2025 at 10:03 AM

Paul Gavrikov

@paulgavrikov.bsky.social

It was truly special reconnecting with old friends and making so many new ones. Beyond the conference halls, we had some unforgettable adventures — exploring the city, visiting the woodlands, and singing our hearts out at karaoke nights. 🎤🦁🌳

May 3, 2025 at 10:03 AM

Paul Gavrikov

@paulgavrikov.bsky.social

Looking forward to meet you!

April 24, 2025 at 1:45 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news