Paul Röttger @ EMNLP
paul-rottger.bsky.social
Paul Röttger @ EMNLP
@paul-rottger.bsky.social
Postdoc @milanlp.bsky.social working on LLM safety and societal impacts. Previously PhD @oii.ox.ac.uk and CTO / co-founder of Rewire (acquired '23)

https://paulrottger.com/
For more details on IssueBench, check out our paper and dataset release. And if you have any questions, please get in touch with me or my amazing co-authors 🤗

Paper: arxiv.org/abs/2502.08395
Data: huggingface.co/datasets/Pau...
October 29, 2025 at 4:12 PM
Even the issues on which models diverge in stance remain largely the same: Writing about Chinese political issues, Grok falls in with other Western-origin LLMs while DeepSeek’s bias better matches fellow Chinese LLM Qwen.
October 29, 2025 at 4:12 PM
For this final version of our paper, we added results for Grok and DeepSeek alongside GPT, Llama, Qwen, and OLMo.

Surprisingly, despite being developed in quite different settings, all models are very similar in how they write about different political issues.
October 29, 2025 at 4:12 PM
Quick recap of our setup:

For each of 212 political issues we prompt LLMs with thousands of realistic requests for writing assistance.

Then we classify each model response for which stance it expresses on the issue at hand.
October 29, 2025 at 4:12 PM
Finally, there's a couple of papers on *LLM persuasion* on the schedule today. Particularly looking forward to Jillian Fisher's talk on biased LLMs influencing political decision-making!
July 28, 2025 at 6:13 AM
*pluralism* in human values & preferences (e.g. with personalisation) will also just
grow more important for a global diversity of users.

@morlikow.bsky.social is presenting our poster today at 1100. Also hyped for @michaelryan207.bsky.social's work and @verenarieser.bsky.social's keynote!
July 28, 2025 at 6:13 AM
Measuring *social and political biases* in LLMs is more important than ever, now that >500 million people use LLMs.

I am particularly excited to check out work on this by @kldivergence.bsky.social @1e0sun.bsky.social @jacyanthis.bsky.social @anjaliruban.bsky.social
July 28, 2025 at 6:13 AM
Very excited about all these papers on sociotechnical alignment & the societal impacts of AI at #ACL2025.

As is now tradition, I made some timetables to help me find my way around. Sharing here in case others find them useful too :) 🧵
July 28, 2025 at 6:13 AM
Lastly, we use IssueBench to test for partisan political bias by comparing LLM biases to US voter stances on a subset of 20 issues. On these issues, models are much (!) more aligned with Democrat than Republican voters.
February 13, 2025 at 2:08 PM
Notably, when there was a difference in bias between models, it was mostly due to Qwen. The two issues with the most divergence both relate to Chinese politics, and Qwen (developed in China) is more positive / less negative about these issues.
February 13, 2025 at 2:08 PM
We were very surprised just how similar LLMs were in their biases. Even across different model families (Llama, Qwen, OLMo, GPT-4) models showed very similar stance patterns across issues.
February 13, 2025 at 2:08 PM
Overall, we found that the stronger a model's default stance on an issue, the harder it is to steer the model away from this stance. So if a model defaults to a positive stance on an issue, users will struggle more to make it express the opposite view.
February 13, 2025 at 2:08 PM
But before that, we look at steerability:

Models are generally steerable, but will often *hedge* their responses. For example, models will argue that electric cars are bad if you ask them to, but not without also mentioning their benefits (4).
February 13, 2025 at 2:08 PM
For example, models are most consistently positive about social justice and environmental issues. Many of these issues are politically contested (e.g. in the US), but for models they are very clear-cut.

We follow up on this further below.
February 13, 2025 at 2:08 PM
Finally: results!

First, models express a very consistent stance on ≥70% of the issues in IssueBench. This is surprising since nearly all issues we test lack societal consensus. Yet models are often consistently positive (1, 2) or negative (4, 5).
February 13, 2025 at 2:08 PM
For classifying the stance of each LLM response (so we can measure stance tendency) we introduce a response taxonomy that goes beyond just “positive” and “negative”. We also optimise a zero-shot prompt to automate this classification with high accuracy.
February 13, 2025 at 2:08 PM
For each issue, we create thousands of test prompts using templates based on real user requests for LLM writing assistance. These templates vary a lot in terms of writing formats and styles, including fun ones like "chaotic rap about [ISSUE]" in the long tail.
February 13, 2025 at 2:08 PM
We cover 212 political issues from real user chats with LLMs. These issues are extremely varied, spanning tech (e.g. military drones), social justice (gender equality), the environment (carbon emissions) and many more policy areas.
February 13, 2025 at 2:08 PM
Before we get to those results though, let me briefly explain our setup:

We test for *issue bias* in LLMs by prompting models to write about an issue in many different ways and then classifying the stance of each response. Bias in this setting is when one stance dominates.
February 13, 2025 at 2:08 PM
Are LLMs biased when they write about political issues?

We just released IssueBench – the largest, most realistic benchmark of its kind – to answer this question more robustly than ever before.

Long 🧵with spicy results 👇
February 13, 2025 at 2:08 PM
MSTS was a huge group effort by 22 co-authors. Special thanks to Bertie Vidgen for co-leading the project. To core contributors @g8a9.bsky.social Felix Friedrich, @jagoldz.bsky.social and Alicia Parrish. And also to @mlcommons.org for making the project possible in the first place!
January 21, 2025 at 11:36 AM
Throughout the paper, we used manual annotation to produce maximally reliable results. But can we automate multimodal safety assessments?

We find that even the best commercial VLMs struggle with this task.
January 21, 2025 at 11:36 AM
To isolate the impact of multimodality on safety, we created text-only versions of all our multimodal test prompts.

We find that VLMs produce unsafe responses for at least some multimodal prompts *only because they are multimodal*.
January 21, 2025 at 11:36 AM
We also manually translated MSTS into 10 other languages.

For the multilingual open VLM that we test, we find clear safety differences across languages. Notably, MiniCPM was developed in China and indeed produces the least unsafe responses in Chinese.
January 21, 2025 at 11:36 AM
When tested with MSTS, several open VLMs show clear safety issues. Not only do some VLMs give explicitly unsafe responses, but many are *safe by accident*, misunderstanding a high % of test prompts.

Commercial models like Claude and GPT, on the other hand, do quite well on MSTS.
January 21, 2025 at 11:36 AM