Allison Koenecke
allisonkoe.bsky.social
Allison Koenecke
@allisonkoe.bsky.social
asst prof @ cornell info sci | fairness in tech, public health & services | alum of MSR, Stanford ICME, NERA Econ, MIT Math | she/her | koenecke.infosci.cornell.edu
This is likely due to differences in tokenization between Simplified Chinese and Traditional Chinese. The exact same names, when translated between language settings, result in significantly different numbers of tokens when represented in each of the models. (12/14)
June 22, 2025 at 9:16 PM
But, written character choice (in Traditional or Simplified) seems to be the primary driver of LLM preferences. Conditioning on the same names (which have different characters in Traditional vs. Simplified), we can flip our results & get majority Simplified names selected (11/14)
June 22, 2025 at 9:16 PM
(3) Some LLMs prefer certain characters, like 俊 and 宇, which are more common in Taiwanese names. Baichuan-2 often describes selected Taiwanese names as having qualities related to “talent” and “wisdom.” This does seem like a partial explanation! (10/14)
June 22, 2025 at 9:16 PM
(2) Gender bias exists: male names are selected more frequently than female names in almost all LLMs. But, balancing our experiments on gender still yields a slight preference for Taiwanese names. (9/14)
June 22, 2025 at 9:16 PM
(1) We define name popularity both as (a) names appearing often in online searches, like celebrities and (b) population counts. Controlling for either definition doesn’t affect LLM preference for Taiwanese names. (8/14)
June 22, 2025 at 9:16 PM
Task 2: Conversely, LLMs disproportionately favor Traditional Chinese names. This trend holds regardless of LLM degree of adherence to prompt instructions (with some LLMs refusing to choose a candidate without sufficient info–good!, and some always returning a name) (6/14)
June 22, 2025 at 9:16 PM
We hypothesize that this pro-Simplified bias occurs due to the underrepresentation of niche Traditional Chinese terms in training corpora. We studied this by comparing large online corpora with different underlying Chinese scripts as proxies for likely LLM training data. (5/14)
June 22, 2025 at 9:16 PM
Task 1: LLMs perform best when prompted in Simplified Chinese. We find significant unidirectional "misaligned responses": when LLMs are prompted in Traditional Chinese but respond with the Simplified Chinese terms (e.g., with Bo Luo instead of Feng Li for 🍍). (4/14)
June 22, 2025 at 9:16 PM
We audit 11 LLMs on two tasks, comparing responses when prompted in Simplified vs. Traditional Chinese: (1) regional term choice—can LLMs correctly use cultural-specific terms (🍍)? (2) regional name choice—do LLMs show hiring preferences based on how a name is written? (3/14)
June 22, 2025 at 9:16 PM
Depending on whether we prompt an LLM in Simplified or Traditional Chinese, LLMs trained with different regional foci may be differently aligned. E.g., Qwen gets 🍍correct in Simplified, but guesses papaya in Traditional Chinese.(2/14)
June 22, 2025 at 9:16 PM
LLMs are now used in high-stakes tasks—from education to hiring—prone to linguistic biases. We focus on biases in written Chinese: Do LLMs perform differently when prompted in Simplified vs. Traditional Chinese? E.g., words like 🍍should be written differently! (1/14)
June 22, 2025 at 9:16 PM
🎉Excited to present our paper tomorrow at @facct.bsky.social, “Characterizing Bias: Benchmarking Large Language Models in Simplified versus Traditional Chinese”, with @brucelyu17.bsky.social, Jiebo Luo and Jian Kang, revealing 🤖 LLM performance disparities. 📄 Link: arxiv.org/abs/2505.22645
June 22, 2025 at 9:16 PM
📢 Apply by *Feb 17* to join us at our 1-day, in-person & hybrid CHI 2025 workshop, Speech AI for All, where we'll discuss inclusive speech tech for people with speech diversities. Researchers, practitioners, policymakers, & community members welcome! speechai4all.org
January 29, 2025 at 8:19 PM
📢Announcing 1-day CHI 2025 workshop: Speech AI for All! We’ll discuss challenges & impacts of inclusive speech tech for people with speech diversities, connecting researchers, practitioners, policymakers, & community members. 🎉Apply to join us: speechai4all.org
December 16, 2024 at 7:45 PM