Anh (Totti) Nguyen
banner
anh-ng8.bsky.social
Anh (Totti) Nguyen
@anh-ng8.bsky.social
In search of an intelligent and explainable AI. Machine Learning, Human-Computer Interaction, and Javascript. Associate Professor at Auburn U.
🌐 https://anhnguyen.me/
🐦 https://x.com/anh_ng8
8️⃣ Work led by the super duo
✨ An Võ + Khải Nguyên Nguyễn ✨
w/ countless assists from Mohammad Taesiri, Tường Vy Đặng & Prof. Daeyoung Kim.

Code & data: vlmsarebiased.github.io
Paper: arxiv.org/abs/2505.23941

inspired by vlmsareblind.github.io

Thank you for any feedback 🙏

8/8
June 5, 2025 at 7:28 PM
7️⃣ On a task where we create from scratch.

Q: Count the circles in cell C3.
🤖: 3 ❌

VLMs are only ~22% accurate and biased towards the surrounding cells.

7/8
June 5, 2025 at 7:28 PM
6️⃣ Optical illusion is an interesting task. VLMs know all 6 illusions and their expected answers.

But, here we modify Ebbinghaus pattern so that two inner circles clearly differ in size. And...

o3: equal ❌
Sonnet 3.7: equal ❌

6/8
June 5, 2025 at 7:28 PM
5️⃣ Bias exists across SIX domains of decreasing popularity (animals -> logos -> flags -> chess pieces -> optical illusion -> boardgames) and ONE domain where we create novel patterns that do not exist on the Internet. ⚠️

🟧 % of predictable, biased answers by VLMs.

5/8
June 5, 2025 at 7:28 PM
4️⃣ We study bias using neutral counting questions (Q1/Q2) as opposed to setting up models to fail by a textual (adversarial?) prompt (Q3) as in prior work.

4/8
June 5, 2025 at 7:28 PM
3️⃣ Via tests, VLMs 100% recognize ✅ every subject and its well-known visual elements (e.g. legs/stripes). But they fail ❌ to count on the counterfactual images

e.g.,: when

- extra leg added to 4-legged animals
- extra stripe added to 3-striped Adidas logo

3/8
June 5, 2025 at 7:28 PM
2️⃣ Asking VLMs to examine carefully, use code/tools won't help since they are so (over)confident.

Hard to believe? 😅
Image to try yourself: 👇http://s.anhnguyen.me/250602__zebra_original_image.png

More examples: github.com/anvo25/vlms-...

2/8
June 5, 2025 at 7:28 PM
🧵 Vision Language Models are ⚠️ biased

Q: Count the legs of this animal?
🤖: 4 ❌

Same problem:
- w/ 5 best VLMs: GPT-4.1, o3, o4-mini, Gemini 2.5 Pro, Sonnet 3.7
- on 7 domains: animals, logos, flags, chess, boardgames, optical illusions, patterned grids

code, paper, data: vlmsarebiased.github.io
June 5, 2025 at 7:28 PM