Anna Seo Gyeong Choi
Anna Seo Gyeong Choi
@annaseogyeongchoi.bsky.social
phd @ cornell information science
Please read our actual paper for more interesting details! 📄 arxiv.org/abs/2510.00962
Come discuss more about #FairML at our presentation session! Happening (right now!!!) on Nov 5th 7PM EST at Gather Session 3!
Analyzing Dialectical Biases in LLMs for Knowledge and Reasoning Benchmarks
Large language models (LLMs) are ubiquitous in modern day natural language processing. However, previous work has shown degraded LLM performance for under-represented English dialects. We analyze the ...
arxiv.org
November 6, 2025 at 12:13 AM
Huge thanks to our collaborators from @cornelluniversity.bsky.social and Apple, and previous researchers who’ve inspired us to do this work @diyiyang.bsky.social , @jurafsky.bsky.social , @valentinhofmann.bsky.social , @angelinawang.bsky.social , @mixedlinguist.bsky.social @emilymbender.bsky.social
November 6, 2025 at 12:13 AM
Dialect speakers already face real quality-of-service harms when distinctive grammatical structures systematically diverge from what LLMs were trained on. Pinpointing specific grammar rules gives us concrete targets for bias mitigation that can transfer across multiple dialects! 🎯
November 6, 2025 at 12:11 AM
We can decompose performance degradation by individual grammar rules.
Three rules – existential “it”, zero copula, and y’all – account for roughly half of a dialect’s accuracy decreases, relative to Standard American English accuracy.
November 6, 2025 at 12:10 AM
Example:
SAE: “Can you drive with a beer in Texas?” → Correct Answer: No
Dialect: “Can y’all drive with a beer in Texas?” → GPT-4o-mini Answer: Yes
Same meaning. Different grammar. Different results.
November 6, 2025 at 12:10 AM
We used the Multi-VALUE package to transform Standard American English questions from QA datasets into dialectal variants based on grammatical rules.
November 6, 2025 at 12:10 AM
We studied 6 English dialects (African American, Appalachian, Chicano, Indian, Singaporean, Southern) across 3 LLMs using 3 multiple-choice QA benchmarks.
The question: Do dialects affect performance even on easy tasks?
Answer: YES, with worst performance on Singaporean English.
November 6, 2025 at 12:08 AM