Petter Törnberg
@pettertornberg.com
Assistant Professor in Computational Social Science at University of Amsterdam
Studying the intersection of AI, social media, and politics.
Polarization, misinformation, radicalization, digital platforms, social complexity.
Studying the intersection of AI, social media, and politics.
Polarization, misinformation, radicalization, digital platforms, social complexity.
Find my co-authors on Bluesky: @chrisbail.bsky.social @cbarrie.bsky.social
Colleagues who do excellent work in this field, and might find these results interesting:
@mbernst.bsky.social
@robbwiller.bsky.social
@joon-s-pk.bsky.social
@janalasser.bsky.social
@dgarcia.eu
@aaronshaw.bsky.social
Colleagues who do excellent work in this field, and might find these results interesting:
@mbernst.bsky.social
@robbwiller.bsky.social
@joon-s-pk.bsky.social
@janalasser.bsky.social
@dgarcia.eu
@aaronshaw.bsky.social
November 7, 2025 at 11:19 AM
Find my co-authors on Bluesky: @chrisbail.bsky.social @cbarrie.bsky.social
Colleagues who do excellent work in this field, and might find these results interesting:
@mbernst.bsky.social
@robbwiller.bsky.social
@joon-s-pk.bsky.social
@janalasser.bsky.social
@dgarcia.eu
@aaronshaw.bsky.social
Colleagues who do excellent work in this field, and might find these results interesting:
@mbernst.bsky.social
@robbwiller.bsky.social
@joon-s-pk.bsky.social
@janalasser.bsky.social
@dgarcia.eu
@aaronshaw.bsky.social
This has been carried out by amazing Nicolò Pagan, with Chris Bail, Chris Barrie, and Anikó Hannák.
Paper (preprint): arxiv.org/abs/2511.04195
Happy to share prompts, configs, and analysis scripts.
Paper (preprint): arxiv.org/abs/2511.04195
Happy to share prompts, configs, and analysis scripts.
Computational Turing Test Reveals Systematic Differences Between Human and AI Language
Large language models (LLMs) are increasingly used in the social sciences to simulate human behavior, based on the assumption that they can generate realistic, human-like text. Yet this assumption rem...
arxiv.org
November 7, 2025 at 11:13 AM
This has been carried out by amazing Nicolò Pagan, with Chris Bail, Chris Barrie, and Anikó Hannák.
Paper (preprint): arxiv.org/abs/2511.04195
Happy to share prompts, configs, and analysis scripts.
Paper (preprint): arxiv.org/abs/2511.04195
Happy to share prompts, configs, and analysis scripts.
Takeaways for researchers:
• LLMs are worse stand-ins for humans than they may appear.
• Don’t rely on human judges.
• Measure detectability and meaning.
• Expect a style–meaning trade-off.
• Use examples + context, not personas.
• Affect is still the biggest giveaway.
• LLMs are worse stand-ins for humans than they may appear.
• Don’t rely on human judges.
• Measure detectability and meaning.
• Expect a style–meaning trade-off.
• Use examples + context, not personas.
• Affect is still the biggest giveaway.
November 7, 2025 at 11:13 AM
Takeaways for researchers:
• LLMs are worse stand-ins for humans than they may appear.
• Don’t rely on human judges.
• Measure detectability and meaning.
• Expect a style–meaning trade-off.
• Use examples + context, not personas.
• Affect is still the biggest giveaway.
• LLMs are worse stand-ins for humans than they may appear.
• Don’t rely on human judges.
• Measure detectability and meaning.
• Expect a style–meaning trade-off.
• Use examples + context, not personas.
• Affect is still the biggest giveaway.
We also found some surprising trade-offs:
🎭 When models sound more human, they drift from what people actually say.
🧠 When they match meaning better, they sound less human.
Style or meaning — you have to pick one.
🎭 When models sound more human, they drift from what people actually say.
🧠 When they match meaning better, they sound less human.
Style or meaning — you have to pick one.
November 7, 2025 at 11:13 AM
We also found some surprising trade-offs:
🎭 When models sound more human, they drift from what people actually say.
🧠 When they match meaning better, they sound less human.
Style or meaning — you have to pick one.
🎭 When models sound more human, they drift from what people actually say.
🧠 When they match meaning better, they sound less human.
Style or meaning — you have to pick one.
So what actually helps?
Not personas. And fine-tuning? Not always.
The real improvements came from:
✅ Providing stylistic examples of the user
✅ Adding context retrieval from past posts
Together, these reduced detectability by 4-16 percentage points.
Not personas. And fine-tuning? Not always.
The real improvements came from:
✅ Providing stylistic examples of the user
✅ Adding context retrieval from past posts
Together, these reduced detectability by 4-16 percentage points.
November 7, 2025 at 11:13 AM
So what actually helps?
Not personas. And fine-tuning? Not always.
The real improvements came from:
✅ Providing stylistic examples of the user
✅ Adding context retrieval from past posts
Together, these reduced detectability by 4-16 percentage points.
Not personas. And fine-tuning? Not always.
The real improvements came from:
✅ Providing stylistic examples of the user
✅ Adding context retrieval from past posts
Together, these reduced detectability by 4-16 percentage points.
Some findings surprised us:
⚙️ Instruction-tuned models — the ones fine-tuned to follow prompts — are easier to detect than their base counterparts.
📏 Model size doesn’t help: even 70B models don’t sound more human.
⚙️ Instruction-tuned models — the ones fine-tuned to follow prompts — are easier to detect than their base counterparts.
📏 Model size doesn’t help: even 70B models don’t sound more human.
November 7, 2025 at 11:13 AM
Some findings surprised us:
⚙️ Instruction-tuned models — the ones fine-tuned to follow prompts — are easier to detect than their base counterparts.
📏 Model size doesn’t help: even 70B models don’t sound more human.
⚙️ Instruction-tuned models — the ones fine-tuned to follow prompts — are easier to detect than their base counterparts.
📏 Model size doesn’t help: even 70B models don’t sound more human.
Where do LLMs give themselves away?
❤️ Affective tone and emotion — the clearest tell.
✍️ Stylistic markers — average word length, toxicity, hashtags, emojis.
🧠 Topic profiles — especially on Reddit, where conversations are more diverse and nuanced.
❤️ Affective tone and emotion — the clearest tell.
✍️ Stylistic markers — average word length, toxicity, hashtags, emojis.
🧠 Topic profiles — especially on Reddit, where conversations are more diverse and nuanced.
November 7, 2025 at 11:13 AM
Where do LLMs give themselves away?
❤️ Affective tone and emotion — the clearest tell.
✍️ Stylistic markers — average word length, toxicity, hashtags, emojis.
🧠 Topic profiles — especially on Reddit, where conversations are more diverse and nuanced.
❤️ Affective tone and emotion — the clearest tell.
✍️ Stylistic markers — average word length, toxicity, hashtags, emojis.
🧠 Topic profiles — especially on Reddit, where conversations are more diverse and nuanced.
The results were clear — and surprising.
Even short social media posts written by LLMs are readily distinguishable.
Our BERT-based classifier spots AI with 70–80% accuracy across X, Bluesky, and Reddit.
LLMs are much less human-like than they may seem.
Even short social media posts written by LLMs are readily distinguishable.
Our BERT-based classifier spots AI with 70–80% accuracy across X, Bluesky, and Reddit.
LLMs are much less human-like than they may seem.
November 7, 2025 at 11:13 AM
The results were clear — and surprising.
Even short social media posts written by LLMs are readily distinguishable.
Our BERT-based classifier spots AI with 70–80% accuracy across X, Bluesky, and Reddit.
LLMs are much less human-like than they may seem.
Even short social media posts written by LLMs are readily distinguishable.
Our BERT-based classifier spots AI with 70–80% accuracy across X, Bluesky, and Reddit.
LLMs are much less human-like than they may seem.
We test the state-of-the-art methods for calibrating LLMs — and then push further, using advanced fine-tuning.
We benchmark 9 open-weight LLMs across 5 calibration strategies:
👤 Persona
✍️ Stylistic examples
🧩 Context retrieval
⚙️ Fine-tuning
🎯 Post-generation selection
We benchmark 9 open-weight LLMs across 5 calibration strategies:
👤 Persona
✍️ Stylistic examples
🧩 Context retrieval
⚙️ Fine-tuning
🎯 Post-generation selection
November 7, 2025 at 11:13 AM
We test the state-of-the-art methods for calibrating LLMs — and then push further, using advanced fine-tuning.
We benchmark 9 open-weight LLMs across 5 calibration strategies:
👤 Persona
✍️ Stylistic examples
🧩 Context retrieval
⚙️ Fine-tuning
🎯 Post-generation selection
We benchmark 9 open-weight LLMs across 5 calibration strategies:
👤 Persona
✍️ Stylistic examples
🧩 Context retrieval
⚙️ Fine-tuning
🎯 Post-generation selection
We use our Computational Turing Test to see whether LLMs can produce realistic social media conversations.
We use data from X (Twitter), Bluesky, and Reddit.
This task is arguably what LLMs should do best: they are literally trained on this data!
We use data from X (Twitter), Bluesky, and Reddit.
This task is arguably what LLMs should do best: they are literally trained on this data!
November 7, 2025 at 11:13 AM
We use our Computational Turing Test to see whether LLMs can produce realistic social media conversations.
We use data from X (Twitter), Bluesky, and Reddit.
This task is arguably what LLMs should do best: they are literally trained on this data!
We use data from X (Twitter), Bluesky, and Reddit.
This task is arguably what LLMs should do best: they are literally trained on this data!
We introduce a Computational Turing Test — a validation framework that compares human and LLM text using:
🕵️♂️ Detectability — can an ML classifier tell AI from human?
🧠 Semantic fidelity — does it mean the same thing?
✍️ Interpretable linguistic features — style, tone, topics.
🕵️♂️ Detectability — can an ML classifier tell AI from human?
🧠 Semantic fidelity — does it mean the same thing?
✍️ Interpretable linguistic features — style, tone, topics.
November 7, 2025 at 11:13 AM
We introduce a Computational Turing Test — a validation framework that compares human and LLM text using:
🕵️♂️ Detectability — can an ML classifier tell AI from human?
🧠 Semantic fidelity — does it mean the same thing?
✍️ Interpretable linguistic features — style, tone, topics.
🕵️♂️ Detectability — can an ML classifier tell AI from human?
🧠 Semantic fidelity — does it mean the same thing?
✍️ Interpretable linguistic features — style, tone, topics.
Most prior work validated "human-likeness" with human judges. Basically, do people think it looks human?
But humans are actually really bad at this task: we are subjective, scale poorly, and very easy to fool.
We need something more rigorous.
But humans are actually really bad at this task: we are subjective, scale poorly, and very easy to fool.
We need something more rigorous.
November 7, 2025 at 11:13 AM
Most prior work validated "human-likeness" with human judges. Basically, do people think it looks human?
But humans are actually really bad at this task: we are subjective, scale poorly, and very easy to fool.
We need something more rigorous.
But humans are actually really bad at this task: we are subjective, scale poorly, and very easy to fool.
We need something more rigorous.
The battlefield of misinformation isn’t just about facts.
It’s about form.
Design and aesthetics have become powerful weapons - shaping what feels rational, what seems credible, and who gets to speak for science.
It’s about form.
Design and aesthetics have become powerful weapons - shaping what feels rational, what seems credible, and who gets to speak for science.
November 4, 2025 at 8:48 PM
The battlefield of misinformation isn’t just about facts.
It’s about form.
Design and aesthetics have become powerful weapons - shaping what feels rational, what seems credible, and who gets to speak for science.
It’s about form.
Design and aesthetics have become powerful weapons - shaping what feels rational, what seems credible, and who gets to speak for science.
This aesthetic strategy expands denialism’s reach.
It appeals to audiences who’d never click on conspiracies -
because it looks like reason, not ideology.
By mimicking science, denialists perform neutrality while undermining it.
This isn’t just denial.
It’s strategic depoliticization.
It appeals to audiences who’d never click on conspiracies -
because it looks like reason, not ideology.
By mimicking science, denialists perform neutrality while undermining it.
This isn’t just denial.
It’s strategic depoliticization.
November 4, 2025 at 8:48 PM
This aesthetic strategy expands denialism’s reach.
It appeals to audiences who’d never click on conspiracies -
because it looks like reason, not ideology.
By mimicking science, denialists perform neutrality while undermining it.
This isn’t just denial.
It’s strategic depoliticization.
It appeals to audiences who’d never click on conspiracies -
because it looks like reason, not ideology.
By mimicking science, denialists perform neutrality while undermining it.
This isn’t just denial.
It’s strategic depoliticization.
Meanwhile, climate researchers and activists are portrayed as emotional and irrational:
😢 Crying protesters
⚠️ Angry crowds
🚫 “Ideological fanatics”
The contrast is deliberate:
Climate denial looks calm and factual.
Climate action looks hysterical and extreme.
😢 Crying protesters
⚠️ Angry crowds
🚫 “Ideological fanatics”
The contrast is deliberate:
Climate denial looks calm and factual.
Climate action looks hysterical and extreme.
November 4, 2025 at 8:48 PM
Meanwhile, climate researchers and activists are portrayed as emotional and irrational:
😢 Crying protesters
⚠️ Angry crowds
🚫 “Ideological fanatics”
The contrast is deliberate:
Climate denial looks calm and factual.
Climate action looks hysterical and extreme.
😢 Crying protesters
⚠️ Angry crowds
🚫 “Ideological fanatics”
The contrast is deliberate:
Climate denial looks calm and factual.
Climate action looks hysterical and extreme.
These posts could pass for pages from a scientific report -
except they twist or cherry-pick data to cast doubt on climate science.
They give misinformation the aesthetics of rationality:
white men in white lab coats pointing at complicated graphs.
except they twist or cherry-pick data to cast doubt on climate science.
They give misinformation the aesthetics of rationality:
white men in white lab coats pointing at complicated graphs.
November 4, 2025 at 8:48 PM
These posts could pass for pages from a scientific report -
except they twist or cherry-pick data to cast doubt on climate science.
They give misinformation the aesthetics of rationality:
white men in white lab coats pointing at complicated graphs.
except they twist or cherry-pick data to cast doubt on climate science.
They give misinformation the aesthetics of rationality:
white men in white lab coats pointing at complicated graphs.
When we examined the visual language of climate misinformation, the results were striking
We found what we call "scientific mimicry".
Much of it borrows the look and feel of science:
clean graphs, neutral tones, and technical diagrams that perform objectivity.
It looks like science - but it’s not
We found what we call "scientific mimicry".
Much of it borrows the look and feel of science:
clean graphs, neutral tones, and technical diagrams that perform objectivity.
It looks like science - but it’s not
November 4, 2025 at 8:48 PM
When we examined the visual language of climate misinformation, the results were striking
We found what we call "scientific mimicry".
Much of it borrows the look and feel of science:
clean graphs, neutral tones, and technical diagrams that perform objectivity.
It looks like science - but it’s not
We found what we call "scientific mimicry".
Much of it borrows the look and feel of science:
clean graphs, neutral tones, and technical diagrams that perform objectivity.
It looks like science - but it’s not
On social media, content is no longer just text -
it’s text wrapped in images and motion.
Visuals travel faster, trigger emotion more easily, and slip past critical thought.
That’s what makes them such fertile ground for misinformation -
and yet, we’ve barely studied them.
it’s text wrapped in images and motion.
Visuals travel faster, trigger emotion more easily, and slip past critical thought.
That’s what makes them such fertile ground for misinformation -
and yet, we’ve barely studied them.
November 4, 2025 at 8:48 PM
On social media, content is no longer just text -
it’s text wrapped in images and motion.
Visuals travel faster, trigger emotion more easily, and slip past critical thought.
That’s what makes them such fertile ground for misinformation -
and yet, we’ve barely studied them.
it’s text wrapped in images and motion.
Visuals travel faster, trigger emotion more easily, and slip past critical thought.
That’s what makes them such fertile ground for misinformation -
and yet, we’ve barely studied them.
Yeah it should be noted that the ANES data only includes 18+ US citizens.
But this does track with my BSc students. They seem to be much less online than I.
But this does track with my BSc students. They seem to be much less online than I.
October 30, 2025 at 2:16 PM
Yeah it should be noted that the ANES data only includes 18+ US citizens.
But this does track with my BSc students. They seem to be much less online than I.
But this does track with my BSc students. They seem to be much less online than I.
Here's the full preprint.
Feel free to write me if you want any additional analyses in the final version!
arxiv.org/abs/2510.25417
Feel free to write me if you want any additional analyses in the final version!
arxiv.org/abs/2510.25417
Shifts in U.S. Social Media Use, 2020-2024: Decline, Fragmentation, and Enduring Polarization
Using nationally representative data from the 2020 and 2024 American National Election Studies (ANES), this paper traces how the U.S. social media landscape has shifted across platforms, demographics,...
arxiv.org
October 30, 2025 at 8:09 AM
Here's the full preprint.
Feel free to write me if you want any additional analyses in the final version!
arxiv.org/abs/2510.25417
Feel free to write me if you want any additional analyses in the final version!
arxiv.org/abs/2510.25417
Posting is correlated with affective polarization:
😡 The most partisan users — those who love their party and despise the other — are more likely to post about politics
🥊 The result? A loud angry minority dominates online politics, which itself can drive polarization (see doi.org/10.1073/pnas...)
😡 The most partisan users — those who love their party and despise the other — are more likely to post about politics
🥊 The result? A loud angry minority dominates online politics, which itself can drive polarization (see doi.org/10.1073/pnas...)
October 30, 2025 at 8:09 AM
Posting is correlated with affective polarization:
😡 The most partisan users — those who love their party and despise the other — are more likely to post about politics
🥊 The result? A loud angry minority dominates online politics, which itself can drive polarization (see doi.org/10.1073/pnas...)
😡 The most partisan users — those who love their party and despise the other — are more likely to post about politics
🥊 The result? A loud angry minority dominates online politics, which itself can drive polarization (see doi.org/10.1073/pnas...)
Twitter/X is a story on its own:
🔴 While users have become more Republican
💥 POSTING has completely transformed: it has moved nearly ❗50 percentage points❗ from Democrat-dominated to slightly Republican-leaning.
🔴 While users have become more Republican
💥 POSTING has completely transformed: it has moved nearly ❗50 percentage points❗ from Democrat-dominated to slightly Republican-leaning.
October 30, 2025 at 8:09 AM
Twitter/X is a story on its own:
🔴 While users have become more Republican
💥 POSTING has completely transformed: it has moved nearly ❗50 percentage points❗ from Democrat-dominated to slightly Republican-leaning.
🔴 While users have become more Republican
💥 POSTING has completely transformed: it has moved nearly ❗50 percentage points❗ from Democrat-dominated to slightly Republican-leaning.
Politically, the landscape is shifting too:
🔴 Nearly all platforms have become more Republican
🔵 But they remain Democratic-leaning overall
🏃♂️ Democrats are fleeing to smaller platforms (Bluesky, Threads, Mastodon)
🔴 Nearly all platforms have become more Republican
🔵 But they remain Democratic-leaning overall
🏃♂️ Democrats are fleeing to smaller platforms (Bluesky, Threads, Mastodon)
October 30, 2025 at 8:09 AM
Politically, the landscape is shifting too:
🔴 Nearly all platforms have become more Republican
🔵 But they remain Democratic-leaning overall
🏃♂️ Democrats are fleeing to smaller platforms (Bluesky, Threads, Mastodon)
🔴 Nearly all platforms have become more Republican
🔵 But they remain Democratic-leaning overall
🏃♂️ Democrats are fleeing to smaller platforms (Bluesky, Threads, Mastodon)