lee-messi.bsky.social
@lee-messi.bsky.social
These findings suggest that despite advances in reducing bias in LLM outputs, underlying processing biases persist that could systematically influence AI decision-making in ways not captured by current evaluations.
March 17, 2025 at 2:23 PM
In our study, we also found it cost models an average of 53.33% more tokens to complete tasks when dealing with association-incompatible information versus compatible information. Given the environmental costs of AI model inference, this could have environmental consequences at scale.
March 17, 2025 at 2:23 PM
For instance, we find that reasoning models are more likely to refuse generating counter-stereotypical information in RM-IATs about race, which can further reinforce existing societal stereotypes and power structures.
March 17, 2025 at 2:23 PM
The effect sizes we observed (d = 0.53-1.26) represent medium to large effects, consistent with past IAT research in humans and in word embedding models. We argue that these have real-world consequences, especially as reasoning models become widely used.
March 17, 2025 at 2:23 PM
We found that o3-mini requires more tokens when processing association-incompatible than association-compatible information in 9 of 10 RM-IATs, similar to how humans take more time to pair groups with attributes that don't match established associations.
March 17, 2025 at 2:23 PM
Instead of examining model outputs, we measured the number of reasoning tokens used to complete different tasks. This parallels how the human IAT measures differences in reaction time when people respond to association-incompatible versus compatible pairings.
March 17, 2025 at 2:23 PM
In the RM-IAT, o3-mini is instructed to categorize group words (e.g., he, she) in ways that are either compatible (e.g., men-career, women-family) or incompatible (e.g., men-family, women-career) with established associations.
March 17, 2025 at 2:23 PM
We adapted the Implicit Association Test, a method widely used to assess implicit bias in humans, for reasoning models. We call this the RM-IAT. We gave o3-mini 10 RM-IATs documented in the social psychology literature.
arxiv.org/abs/2503.11572
Implicit Bias-Like Patterns in Reasoning Models
Implicit bias refers to automatic or spontaneous mental processes that shape perceptions, judgments, and behaviors. Previous research examining `implicit bias' in large language models (LLMs) has ofte...
arxiv.org
March 17, 2025 at 2:23 PM