Allison Koenecke
allisonkoe.bsky.social
Allison Koenecke
@allisonkoe.bsky.social
asst prof @ cornell info sci | fairness in tech, public health & services | alum of MSR, Stanford ICME, NERA Econ, MIT Math | she/her | koenecke.infosci.cornell.edu
You've been too busy 🀄izing bias in other contexts!
June 22, 2025 at 9:24 PM
Many thanks to the researchers who have inspired our work!! (14/14) @valentinhofmann.bsky.social @jurafsky.bsky.social @haldaume3.bsky.social @hannawallach.bsky.social @jennwv.bsky.social @diyiyang.bsky.social and many others not yet on Bluesky!
June 22, 2025 at 9:16 PM
We encourage practitioners to use our dataset (github.com/brucelyu17/S...) to audit for biases before choosing an LLM to use, and developers to investigate diversifying training data and research tokenization differences across Chinese variants. (13/14)
GitHub - brucelyu17/SC-TC-Bench: [FAccT '25] Characterizing Bias: Benchmarking LLMs in Simplified versus Traditional Chinese
[FAccT '25] Characterizing Bias: Benchmarking LLMs in Simplified versus Traditional Chinese - brucelyu17/SC-TC-Bench
github.com
June 22, 2025 at 9:16 PM
This is likely due to differences in tokenization between Simplified Chinese and Traditional Chinese. The exact same names, when translated between language settings, result in significantly different numbers of tokens when represented in each of the models. (12/14)
June 22, 2025 at 9:16 PM
But, written character choice (in Traditional or Simplified) seems to be the primary driver of LLM preferences. Conditioning on the same names (which have different characters in Traditional vs. Simplified), we can flip our results & get majority Simplified names selected (11/14)
June 22, 2025 at 9:16 PM
(3) Some LLMs prefer certain characters, like 俊 and 宇, which are more common in Taiwanese names. Baichuan-2 often describes selected Taiwanese names as having qualities related to “talent” and “wisdom.” This does seem like a partial explanation! (10/14)
June 22, 2025 at 9:16 PM
(2) Gender bias exists: male names are selected more frequently than female names in almost all LLMs. But, balancing our experiments on gender still yields a slight preference for Taiwanese names. (9/14)
June 22, 2025 at 9:16 PM
(1) We define name popularity both as (a) names appearing often in online searches, like celebrities and (b) population counts. Controlling for either definition doesn’t affect LLM preference for Taiwanese names. (8/14)
June 22, 2025 at 9:16 PM
Why are we seeing this preference for Taiwanese names among LLMs? We use process of elimination on 4 likely explanations: popularity, gender, character, and written script. (7/14)
June 22, 2025 at 9:16 PM
Task 2: Conversely, LLMs disproportionately favor Traditional Chinese names. This trend holds regardless of LLM degree of adherence to prompt instructions (with some LLMs refusing to choose a candidate without sufficient info–good!, and some always returning a name) (6/14)
June 22, 2025 at 9:16 PM