Lightnews — Scholar-powered news

lee-messi.bsky.social

@lee-messi.bsky.social

These findings suggest that despite advances in reducing bias in LLM outputs, underlying processing biases persist that could systematically influence AI decision-making in ways not captured by current evaluations.

March 17, 2025 at 2:23 PM

lee-messi.bsky.social

@lee-messi.bsky.social

In our study, we also found it cost models an average of 53.33% more tokens to complete tasks when dealing with association-incompatible information versus compatible information. Given the environmental costs of AI model inference, this could have environmental consequences at scale.

March 17, 2025 at 2:23 PM

lee-messi.bsky.social

@lee-messi.bsky.social

For instance, we find that reasoning models are more likely to refuse generating counter-stereotypical information in RM-IATs about race, which can further reinforce existing societal stereotypes and power structures.

March 17, 2025 at 2:23 PM

lee-messi.bsky.social

@lee-messi.bsky.social

The effect sizes we observed (d = 0.53-1.26) represent medium to large effects, consistent with past IAT research in humans and in word embedding models. We argue that these have real-world consequences, especially as reasoning models become widely used.

March 17, 2025 at 2:23 PM

lee-messi.bsky.social

@lee-messi.bsky.social

We found that o3-mini requires more tokens when processing association-incompatible than association-compatible information in 9 of 10 RM-IATs, similar to how humans take more time to pair groups with attributes that don't match established associations.

Bar chart showing reasoning token counts comparing association-compatible (dark green bars) versus association-incompatible (orange bars) processing across 10 different RM-IATs. The chart is organized in a 2×5 grid, with each panel representing a different RM-IAT. In all tests except one (i.e., Mental/Physical Diseases + Temporary/Permanent), the orange bars (association-incompatible) are consistently taller than the blue bars (association-compatible). This visually demonstrates that the model requires more computational tokens when processing association-incompatible pairings than association-compatible ones.

March 17, 2025 at 2:23 PM

lee-messi.bsky.social

@lee-messi.bsky.social

Instead of examining model outputs, we measured the number of reasoning tokens used to complete different tasks. This parallels how the human IAT measures differences in reaction time when people respond to association-incompatible versus compatible pairings.

March 17, 2025 at 2:23 PM

lee-messi.bsky.social

@lee-messi.bsky.social

In the RM-IAT, o3-mini is instructed to categorize group words (e.g., he, she) in ways that are either compatible (e.g., men-career, women-family) or incompatible (e.g., men-family, women-career) with established associations.

Visualization illustrating the Reasoning Model Implicit Association Test (RM-IAT) methodology in four steps. Step 1 shows the presentation of stimuli with group categories (e.g., men and women) and attribute categories (e.g., career and family). Step 2 demonstrates two conditions: Association-Compatible (men related to career, women to family) and Association-Incompatible (men related to family, women to career). Step 3 shows identical writing prompts asking to categorize "Steve" in both conditions. Step 4 compares reasoning tokens used: 64 tokens for the compatible association versus 192 tokens for the incompatible association, showing the model requires significantly more computational effort when processing association-incompatible pairings than association-compatible ones.

March 17, 2025 at 2:23 PM

lee-messi.bsky.social

@lee-messi.bsky.social

We adapted the Implicit Association Test, a method widely used to assess implicit bias in humans, for reasoning models. We call this the RM-IAT. We gave o3-mini 10 RM-IATs documented in the social psychology literature.
arxiv.org/abs/2503.11572

Implicit Bias-Like Patterns in Reasoning Models

Implicit bias refers to automatic or spontaneous mental processes that shape perceptions, judgments, and behaviors. Previous research examining `implicit bias' in large language models (LLMs) has ofte...

arxiv.org

March 17, 2025 at 2:23 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news