Lightnews — Scholar-powered news

Anh (Totti) Nguyen

@anh-ng8.bsky.social

In search of an intelligent and explainable AI. Machine Learning, Human-Computer Interaction, and Javascript. Associate Professor at Auburn U.
🌐 https://anhnguyen.me/
🐦 https://x.com/anh_ng8

Posts Replies Media Videos

Anh (Totti) Nguyen

@anh-ng8.bsky.social

8️⃣ Work led by the super duo
✨ An Võ + Khải Nguyên Nguyễn ✨
w/ countless assists from Mohammad Taesiri, Tường Vy Đặng & Prof. Daeyoung Kim.

Code & data: vlmsarebiased.github.io
Paper: arxiv.org/abs/2505.23941

inspired by vlmsareblind.github.io

Thank you for any feedback 🙏

8/8

June 5, 2025 at 7:28 PM

Anh (Totti) Nguyen

@anh-ng8.bsky.social

7️⃣ On a task where we create from scratch.

Q: Count the circles in cell C3.
🤖: 3 ❌

VLMs are only ~22% accurate and biased towards the surrounding cells.

7/8

June 5, 2025 at 7:28 PM

Anh (Totti) Nguyen

@anh-ng8.bsky.social

6️⃣ Optical illusion is an interesting task. VLMs know all 6 illusions and their expected answers.

But, here we modify Ebbinghaus pattern so that two inner circles clearly differ in size. And...

o3: equal ❌
Sonnet 3.7: equal ❌

6/8

June 5, 2025 at 7:28 PM

Anh (Totti) Nguyen

@anh-ng8.bsky.social

5️⃣ Bias exists across SIX domains of decreasing popularity (animals -> logos -> flags -> chess pieces -> optical illusion -> boardgames) and ONE domain where we create novel patterns that do not exist on the Internet. ⚠️

🟧 % of predictable, biased answers by VLMs.

5/8

June 5, 2025 at 7:28 PM

Anh (Totti) Nguyen

@anh-ng8.bsky.social

4️⃣ We study bias using neutral counting questions (Q1/Q2) as opposed to setting up models to fail by a textual (adversarial?) prompt (Q3) as in prior work.

4/8

June 5, 2025 at 7:28 PM

Anh (Totti) Nguyen

@anh-ng8.bsky.social

3️⃣ Via tests, VLMs 100% recognize ✅ every subject and its well-known visual elements (e.g. legs/stripes). But they fail ❌ to count on the counterfactual images

e.g.,: when

- extra leg added to 4-legged animals
- extra stripe added to 3-striped Adidas logo

3/8

June 5, 2025 at 7:28 PM

Anh (Totti) Nguyen

@anh-ng8.bsky.social

2️⃣ Asking VLMs to examine carefully, use code/tools won't help since they are so (over)confident.

Hard to believe? 😅
Image to try yourself: 👇http://s.anhnguyen.me/250602__zebra_original_image.png

More examples: github.com/anvo25/vlms-...

2/8

June 5, 2025 at 7:28 PM

Anh (Totti) Nguyen

@anh-ng8.bsky.social

🧵 Vision Language Models are ⚠️ biased

Q: Count the legs of this animal?
🤖: 4 ❌

Same problem:
- w/ 5 best VLMs: GPT-4.1, o3, o4-mini, Gemini 2.5 Pro, Sonnet 3.7
- on 7 domains: animals, logos, flags, chess, boardgames, optical illusions, patterned grids

code, paper, data: vlmsarebiased.github.io

June 5, 2025 at 7:28 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news