Lightnews — Scholar-powered news

@paul-rottger.bsky.social

For more details on IssueBench, check out our paper and dataset release. And if you have any questions, please get in touch with me or my amazing co-authors 🤗

Paper: arxiv.org/abs/2502.08395
Data: huggingface.co/datasets/Pau...

October 29, 2025 at 4:12 PM

Paul Röttger @ EMNLP

@paul-rottger.bsky.social

Even the issues on which models diverge in stance remain largely the same: Writing about Chinese political issues, Grok falls in with other Western-origin LLMs while DeepSeek’s bias better matches fellow Chinese LLM Qwen.

October 29, 2025 at 4:12 PM

Paul Röttger @ EMNLP

@paul-rottger.bsky.social

For this final version of our paper, we added results for Grok and DeepSeek alongside GPT, Llama, Qwen, and OLMo.

Surprisingly, despite being developed in quite different settings, all models are very similar in how they write about different political issues.

October 29, 2025 at 4:12 PM

Paul Röttger @ EMNLP

@paul-rottger.bsky.social

Quick recap of our setup:

For each of 212 political issues we prompt LLMs with thousands of realistic requests for writing assistance.

Then we classify each model response for which stance it expresses on the issue at hand.

October 29, 2025 at 4:12 PM

Paul Röttger @ EMNLP

@paul-rottger.bsky.social

Finally, there's a couple of papers on *LLM persuasion* on the schedule today. Particularly looking forward to Jillian Fisher's talk on biased LLMs influencing political decision-making!

July 28, 2025 at 6:13 AM

Paul Röttger @ EMNLP

@paul-rottger.bsky.social

*pluralism* in human values & preferences (e.g. with personalisation) will also just
grow more important for a global diversity of users.

@morlikow.bsky.social is presenting our poster today at 1100. Also hyped for @michaelryan207.bsky.social's work and @verenarieser.bsky.social's keynote!

July 28, 2025 at 6:13 AM

Paul Röttger @ EMNLP

@paul-rottger.bsky.social

Measuring *social and political biases* in LLMs is more important than ever, now that >500 million people use LLMs.

I am particularly excited to check out work on this by @kldivergence.bsky.social @1e0sun.bsky.social @jacyanthis.bsky.social @anjaliruban.bsky.social

July 28, 2025 at 6:13 AM

Paul Röttger @ EMNLP

@paul-rottger.bsky.social

Very excited about all these papers on sociotechnical alignment & the societal impacts of AI at #ACL2025.

As is now tradition, I made some timetables to help me find my way around. Sharing here in case others find them useful too :) 🧵

July 28, 2025 at 6:13 AM

Paul Röttger @ EMNLP

@paul-rottger.bsky.social

Lastly, we use IssueBench to test for partisan political bias by comparing LLM biases to US voter stances on a subset of 20 issues. On these issues, models are much (!) more aligned with Democrat than Republican voters.

February 13, 2025 at 2:08 PM

Paul Röttger @ EMNLP

@paul-rottger.bsky.social

Notably, when there was a difference in bias between models, it was mostly due to Qwen. The two issues with the most divergence both relate to Chinese politics, and Qwen (developed in China) is more positive / less negative about these issues.

February 13, 2025 at 2:08 PM

Paul Röttger @ EMNLP

@paul-rottger.bsky.social

We were very surprised just how similar LLMs were in their biases. Even across different model families (Llama, Qwen, OLMo, GPT-4) models showed very similar stance patterns across issues.

February 13, 2025 at 2:08 PM

Paul Röttger @ EMNLP

@paul-rottger.bsky.social

Overall, we found that the stronger a model's default stance on an issue, the harder it is to steer the model away from this stance. So if a model defaults to a positive stance on an issue, users will struggle more to make it express the opposite view.

February 13, 2025 at 2:08 PM

Paul Röttger @ EMNLP

@paul-rottger.bsky.social

But before that, we look at steerability:

Models are generally steerable, but will often *hedge* their responses. For example, models will argue that electric cars are bad if you ask them to, but not without also mentioning their benefits (4).

February 13, 2025 at 2:08 PM

Paul Röttger @ EMNLP

@paul-rottger.bsky.social

For example, models are most consistently positive about social justice and environmental issues. Many of these issues are politically contested (e.g. in the US), but for models they are very clear-cut.

We follow up on this further below.

February 13, 2025 at 2:08 PM

Paul Röttger @ EMNLP

@paul-rottger.bsky.social

Finally: results!

First, models express a very consistent stance on ≥70% of the issues in IssueBench. This is surprising since nearly all issues we test lack societal consensus. Yet models are often consistently positive (1, 2) or negative (4, 5).

February 13, 2025 at 2:08 PM

Paul Röttger @ EMNLP

@paul-rottger.bsky.social

For classifying the stance of each LLM response (so we can measure stance tendency) we introduce a response taxonomy that goes beyond just “positive” and “negative”. We also optimise a zero-shot prompt to automate this classification with high accuracy.

February 13, 2025 at 2:08 PM

Paul Röttger @ EMNLP

@paul-rottger.bsky.social

For each issue, we create thousands of test prompts using templates based on real user requests for LLM writing assistance. These templates vary a lot in terms of writing formats and styles, including fun ones like "chaotic rap about [ISSUE]" in the long tail.

February 13, 2025 at 2:08 PM

Paul Röttger @ EMNLP

@paul-rottger.bsky.social

We cover 212 political issues from real user chats with LLMs. These issues are extremely varied, spanning tech (e.g. military drones), social justice (gender equality), the environment (carbon emissions) and many more policy areas.

February 13, 2025 at 2:08 PM

Paul Röttger @ EMNLP

@paul-rottger.bsky.social

Before we get to those results though, let me briefly explain our setup:

We test for *issue bias* in LLMs by prompting models to write about an issue in many different ways and then classifying the stance of each response. Bias in this setting is when one stance dominates.

February 13, 2025 at 2:08 PM

Paul Röttger @ EMNLP

@paul-rottger.bsky.social

Are LLMs biased when they write about political issues?

We just released IssueBench – the largest, most realistic benchmark of its kind – to answer this question more robustly than ever before.

Long 🧵with spicy results 👇

February 13, 2025 at 2:08 PM

Paul Röttger @ EMNLP

@paul-rottger.bsky.social

MSTS was a huge group effort by 22 co-authors. Special thanks to Bertie Vidgen for co-leading the project. To core contributors @g8a9.bsky.social Felix Friedrich, @jagoldz.bsky.social and Alicia Parrish. And also to @mlcommons.org for making the project possible in the first place!

January 21, 2025 at 11:36 AM

Paul Röttger @ EMNLP

@paul-rottger.bsky.social

Throughout the paper, we used manual annotation to produce maximally reliable results. But can we automate multimodal safety assessments?

We find that even the best commercial VLMs struggle with this task.

January 21, 2025 at 11:36 AM

Paul Röttger @ EMNLP

@paul-rottger.bsky.social

To isolate the impact of multimodality on safety, we created text-only versions of all our multimodal test prompts.

We find that VLMs produce unsafe responses for at least some multimodal prompts *only because they are multimodal*.

January 21, 2025 at 11:36 AM

Paul Röttger @ EMNLP

@paul-rottger.bsky.social

We also manually translated MSTS into 10 other languages.

For the multilingual open VLM that we test, we find clear safety differences across languages. Notably, MiniCPM was developed in China and indeed produces the least unsafe responses in Chinese.

January 21, 2025 at 11:36 AM

Paul Röttger @ EMNLP

@paul-rottger.bsky.social

When tested with MSTS, several open VLMs show clear safety issues. Not only do some VLMs give explicitly unsafe responses, but many are *safe by accident*, misunderstanding a high % of test prompts.

Commercial models like Claude and GPT, on the other hand, do quite well on MSTS.

January 21, 2025 at 11:36 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news