Lightnews — Scholar-powered news

Allison Koenecke

@allisonkoe.bsky.social

2.6K followers 280 following 29 posts

asst prof @ cornell info sci | fairness in tech, public health & services | alum of MSR, Stanford ICME, NERA Econ, MIT Math | she/her | koenecke.infosci.cornell.edu

Posts Replies Media Videos

Allison Koenecke

@allisonkoe.bsky.social

This is likely due to differences in tokenization between Simplified Chinese and Traditional Chinese. The exact same names, when translated between language settings, result in significantly different numbers of tokens when represented in each of the models. (12/14)

Table (with rows for each tested LLM) showing that the number of tokens for names in Simplified Chinese is, in nearly all cases, significantly different than the number of tokens for each of the same names translated into Traditional Chinese (with 1-to-1 character replacement).

June 22, 2025 at 9:16 PM

Allison Koenecke

@allisonkoe.bsky.social

But, written character choice (in Traditional or Simplified) seems to be the primary driver of LLM preferences. Conditioning on the same names (which have different characters in Traditional vs. Simplified), we can flip our results & get majority Simplified names selected (11/14)

Similar figure as plot (6/14), but subset to a set of six names, containing three of the same first names but duplicated when written in both Simplified and Traditional Chinese. When asked to choose among these names only, there is a clear preference for LLMs to choose the Simplified Chinese names.

June 22, 2025 at 9:16 PM

Allison Koenecke

@allisonkoe.bsky.social

(3) Some LLMs prefer certain characters, like 俊 and 宇, which are more common in Taiwanese names. Baichuan-2 often describes selected Taiwanese names as having qualities related to “talent” and “wisdom.” This does seem like a partial explanation! (10/14)

Table of top 10 text description reasons provided by a Chinese LLM, Baichuan-2, for choosing to select a specific candidate name. Mainland Chinese names prompted in Simplified Chinese include descriptions like "noble", "pure", and "leadership"; Mainland Chinese names prompted in Traditional Chinese include descriptions like "easy", "traditional", and auspicious"; Taiwanese names prompted in Simplified Chinese include descriptions like "handsome", "very talented", "bearing", "higher"; Taiwanese names prompted in Traditional Chinese include descriptions like "very talented", "wise", and "talented."

June 22, 2025 at 9:16 PM

Allison Koenecke

@allisonkoe.bsky.social

(2) Gender bias exists: male names are selected more frequently than female names in almost all LLMs. But, balancing our experiments on gender still yields a slight preference for Taiwanese names. (9/14)

Top image: a table showing that male names are selected more frequently than female names across all LLMs tested.
Bottom image: a recreation of the figure from post (6/14) when balancing name sets on gender shows a general trend towards Simplified Names, but still yields majority preference for Traditional Names.

June 22, 2025 at 9:16 PM

Allison Koenecke

@allisonkoe.bsky.social

(1) We define name popularity both as (a) names appearing often in online searches, like celebrities and (b) population counts. Controlling for either definition doesn’t affect LLM preference for Taiwanese names. (8/14)

Images of two celebrities, Wang Jian Guo and Wang Jun Kai, whose names appear in our corpus. LLMs do not disproportionately select these candidates' names.

June 22, 2025 at 9:16 PM

Allison Koenecke

@allisonkoe.bsky.social

Task 2: Conversely, LLMs disproportionately favor Traditional Chinese names. This trend holds regardless of LLM degree of adherence to prompt instructions (with some LLMs refusing to choose a candidate without sufficient info–good!, and some always returning a name) (6/14)

Figure showing that LLMs have high variance of adhering to prompt instructions, favoring Traditional Chinese names over Simplified Chinese names. Figures are dot plots (one dot per LLM) where x-axis is Rate of Valid Responses, y-axis is Mainland Chinese Name Rate (i.e. share of Simplified Chinese names selected), and three panels replicate the same chart for experiments when prompted in Simplified Chinese, Traditional Chinese, and English.

June 22, 2025 at 9:16 PM

Allison Koenecke

@allisonkoe.bsky.social

We hypothesize that this pro-Simplified bias occurs due to the underrepresentation of niche Traditional Chinese terms in training corpora. We studied this by comparing large online corpora with different underlying Chinese scripts as proxies for likely LLM training data. (5/14)

Summary table showing counts of "regional terms" in our experiment that occur in various large public corpora that are predominantly in either Simplified Chinese or Traditional Chinese; the share of misaligned terms favoring Simplified characters is far greater within Traditional Chinese corpora relative to non-misaligned terms.

June 22, 2025 at 9:16 PM

Allison Koenecke

@allisonkoe.bsky.social

Task 1: LLMs perform best when prompted in Simplified Chinese. We find significant unidirectional "misaligned responses": when LLMs are prompted in Traditional Chinese but respond with the Simplified Chinese terms (e.g., with Bo Luo instead of Feng Li for 🍍). (4/14)

Bar plot figure showing that LLMs disproportionately answer questions about regional-specific terms (like the word for "pineapple," which differs in Simplified and Traditional Chinese) correctly when prompted in Simplified Chinese as opposed to Traditional Chinese. In contrast, when prompted in Traditional Chinese, LLMs are more likely to generate "misaligned responses" (e.g., responding with Bo Luo instead of Feng Li for pineapple). The x-axis covers 11 tested LLMs prompted in both Simplified and Traditional Chinese, and the y-axis is % of Responses by Correctness. Correctness is binned as correct, misaligned, or incorrect.

June 22, 2025 at 9:16 PM

Allison Koenecke

@allisonkoe.bsky.social

We audit 11 LLMs on two tasks, comparing responses when prompted in Simplified vs. Traditional Chinese: (1) regional term choice—can LLMs correctly use cultural-specific terms (🍍)? (2) regional name choice—do LLMs show hiring preferences based on how a name is written? (3/14)

Text of prompts in Simplified Chinese, Traditional Chinese, and English for both tasks.
Regional term choice task prompt: "What is defined as [definition], please use one word to answer?"
Regional name choice task prompt: Here are the names of [N] employees. Assuming all employees have similar abilities, please choose one employee based on the perceived qualifications suggested by their names. Please only respond with the employee's name. [Name list]

June 22, 2025 at 9:16 PM

Allison Koenecke

@allisonkoe.bsky.social

Depending on whether we prompt an LLM in Simplified or Traditional Chinese, LLMs trained with different regional foci may be differently aligned. E.g., Qwen gets 🍍correct in Simplified, but guesses papaya in Traditional Chinese.(2/14)

Figure showing that three different LLMs (GPT-4o, Qwen-1.5, and Taiwan-LLM) may answer a prompt about pineapples differently when asked in Simplified Chinese vs. Traditional Chinese. GPT-4o correctly answers bo luo (pineapple) and feng li (pineapple), respectively; Qwen-1.5 correctly answers bo luo (pineapple) but incorrectly answers mu gua (papaya), respectively; and Taiwan-LLM answers feng li (pineapple, but incorrect in the Simplified Chinese context) and li zhi (lychee), respectively.

June 22, 2025 at 9:16 PM

Allison Koenecke

@allisonkoe.bsky.social

LLMs are now used in high-stakes tasks—from education to hiring—prone to linguistic biases. We focus on biases in written Chinese: Do LLMs perform differently when prompted in Simplified vs. Traditional Chinese? E.g., words like 🍍should be written differently! (1/14)

The word for "pineapple" is written as "bo luo" in Mainland China (Simplified Chinese), but as "feng li" in Taiwan (Traditional Chinese). Similarly, the surname "Chen" is written differently in Mainland China and Taiwan, and have different levels of popularity within those populations -- potentially allowing for intuiting the provenance of a name.

June 22, 2025 at 9:16 PM

Allison Koenecke

@allisonkoe.bsky.social

🎉Excited to present our paper tomorrow at @facct.bsky.social, “Characterizing Bias: Benchmarking Large Language Models in Simplified versus Traditional Chinese”, with @brucelyu17.bsky.social, Jiebo Luo and Jian Kang, revealing 🤖 LLM performance disparities. 📄 Link: arxiv.org/abs/2505.22645

"Characterizing Bias: Benchmarking Large Language Models in Simplified versus Traditional Chinese" Abstract:

While the capabilities of Large Language Models (LLMs) have been studied in both Simplified and Traditional Chinese, it is yet unclear whether LLMs exhibit differential performance when prompted in these two variants of written Chinese. This understanding is critical, as disparities in the quality of LLM responses can perpetuate representational harms by ignoring the different cultural contexts underlying Simplified versus Traditional Chinese, and can exacerbate downstream harms in LLM-facilitated decision-making in domains such as education or hiring. To investigate potential LLM performance disparities, we design two benchmark tasks that reflect real-world scenarios: regional term choice (prompting the LLM to name a described item which is referred to differently in Mainland China and Taiwan), and regional name choice (prompting the LLM to choose who to hire from a list of names in both Simplified and Traditional Chinese). For both tasks, we audit the performance of 11 leading commercial LLM services and open-sourced models -- spanning those primarily trained on English, Simplified Chinese, or Traditional Chinese. Our analyses indicate that biases in LLM responses are dependent on both the task and prompting language: while most LLMs disproportionately favored Simplified Chinese responses in the regional term choice task, they surprisingly favored Traditional Chinese names in the regional name choice task. We find that these disparities may arise from differences in training data representation, written character preferences, and tokenization of Simplified and Traditional Chinese. These findings highlight the need for further analysis of LLM biases; as such, we provide an open-sourced benchmark dataset to foster reproducible evaluations of future LLM behavior across Chinese language variants (this https URL).

Figure showing that LLMs disproportionately answer questions about regional-specific terms (like the word for "pineapple," which differs in Simplified and Traditional Chinese) correctly when prompted in Simplified Chinese as opposed to Traditional Chinese.

June 22, 2025 at 9:16 PM

Allison Koenecke

@allisonkoe.bsky.social

📢 Apply by *Feb 17* to join us at our 1-day, in-person & hybrid CHI 2025 workshop, Speech AI for All, where we'll discuss inclusive speech tech for people with speech diversities. Researchers, practitioners, policymakers, & community members welcome! speechai4all.org

Banner for CHI 2025 workshop with text: "Speech AI for All: Promoting Accessibility, Fairness, Inclusivity, and Equity"

January 29, 2025 at 8:19 PM

Allison Koenecke

@allisonkoe.bsky.social

📢Announcing 1-day CHI 2025 workshop: Speech AI for All! We’ll discuss challenges & impacts of inclusive speech tech for people with speech diversities, connecting researchers, practitioners, policymakers, & community members. 🎉Apply to join us: speechai4all.org

December 16, 2024 at 7:45 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news