Joachim Baumann
joachimbaumann.bsky.social
Joachim Baumann
@joachimbaumann.bsky.social
Postdoc @stanfordnlp.bsky.social / previously @milanlp.bsky.social / Computational social science, LLMs, algorithmic fairness
Pinned
🚨 New paper alert 🚨 Using LLMs as data annotators, you can produce any scientific result you want. We call this **LLM Hacking**.

Paper: arxiv.org/pdf/2509.08825
Dirk and Debora are amazing postdoc advisors, and the @milanlp.bsky.social team is fun fun fun ❤️ you should apply!
🚀 We’re opening 2 fully funded postdoc positions in #NLP!

Join the MilaNLP team and contribute to our upcoming research projects.

🔗 More details: milanlproc.github.io/open_positio...

⏰ Deadline: Jan 31, 2026
December 18, 2025 at 4:51 PM
Reposted by Joachim Baumann
Did you know that from tomorrow, Qualtrics is offering synthetic panels (AI-generated participants)?

Follow me down a rabbit hole I'm calling "doing science is tough and I'm so busy, can't we just make up participants?"
December 16, 2025 at 5:38 PM
Reposted by Joachim Baumann
At today’s lab reading group @carolin-holtermann.bsky.social presented ‘Fairness through Difference Awareness: Measuring Desired Group Discrimination in LLMs’ by @angelinawang.bsky.social et al. (2025).
Lots to think about how we evaluate fairness in language models!

#NLProc #fairness #LLMs
December 11, 2025 at 11:55 AM
Now that the hype has cooled off, here's my take on AI-generated survey answers:
This is a real problem, but the paper's core insights aren't exactly news!
A thread with the most important summary... 🧵

Image: shows the LLM system prompt used
November 28, 2025 at 10:54 AM
Reposted by Joachim Baumann
Another exhausting day in the lab… conducting very rigorous panettone analysis. Pandoro was evaluated too, because we believe in fair experimental design.
November 27, 2025 at 4:06 PM
Reposted by Joachim Baumann
For our weekly reading group, @joachimbaumann.bsky.social presented the upcoming PNAS article "The potential existential threat of large language models to online survey research" by @
@seanjwestwood.bsky.social.
November 20, 2025 at 11:54 AM
Reposted by Joachim Baumann
Google AI overviews now reach over 2B users worldwide. But how reliable are they on high stakes topics - for instance, pregnancy and baby care?

We have a new paper - led by Desheng Hu, now accepted at @icwsm.bsky.social - exploring that and finding many issues

Preprint: arxiv.org/abs/2511.12920
🧵👇
Auditing Google's AI Overviews and Featured Snippets: A Case Study on Baby Care and Pregnancy
Google Search increasingly surfaces AI-generated content through features like AI Overviews (AIO) and Featured Snippets (FS), which users frequently rely on despite having no control over their presen...
arxiv.org
November 19, 2025 at 4:58 PM
Reposted by Joachim Baumann
Trying an experiment in good old-fashioned blogging about papers: dallascard.github.io/granular-mat...
Language Model Hacking - Granular Material
dallascard.github.io
November 16, 2025 at 7:51 PM
Reposted by Joachim Baumann
Next Wednesday, we are very excited to have
@joachimbaumann.bsky.social, who will present co-authored work on "Large Language Model Hacking: Quantifying the Hidden Risks of Using LLMs for Text Annotation". Paper and information on how to join ⬇️
November 8, 2025 at 1:31 PM
Reposted by Joachim Baumann
Can AI simulate human behavior? 🧠
The promise is revolutionary for science & policy. But there’s a huge "IF": Do these simulations actually reflect reality?
To find out, we introduce SimBench: The first large-scale benchmark for group-level social simulation. (1/9)
October 28, 2025 at 4:54 PM
Cool paper by @eddieyang.bsky.social, confirming our LLM hacking findings (arxiv.org/abs/2509.08825):
✓ LLMs are brittle data annotators
✓ Downstream conclusions flip frequently: LLM hacking risk is real!
✓ Bias correction methods can help but have trade-offs
✓ Use human expert whenever possible
October 21, 2025 at 8:02 AM
Reposted by Joachim Baumann
Looks interesting! We have been facing this exact issue - finding big inconsistencies across different LLMs rating the same text.
🚨 New paper alert 🚨 Using LLMs as data annotators, you can produce any scientific result you want. We call this **LLM Hacking**.

Paper: arxiv.org/pdf/2509.08825
September 25, 2025 at 2:58 PM
Reposted by Joachim Baumann
About last week’s internal hackathon 😏
Last week, we -- the (Amazing) Social Computing Group, held an internal hackathon to work on our informally called “Cultural Imperialism” project.
September 17, 2025 at 8:24 AM
Reposted by Joachim Baumann
If you feel uneasy using LLMs for data annotation, you are right (if not, you should). It offers new chances for research that is difficult with traditional #NLP/#textasdata methods, but the risk of false conclusions is high!

Experiment + *evidence-based* mitigation strategies in this preprint 👇
🚨 New paper alert 🚨 Using LLMs as data annotators, you can produce any scientific result you want. We call this **LLM Hacking**.

Paper: arxiv.org/pdf/2509.08825
September 15, 2025 at 1:05 PM
🚨 New paper alert 🚨 Using LLMs as data annotators, you can produce any scientific result you want. We call this **LLM Hacking**.

Paper: arxiv.org/pdf/2509.08825
September 12, 2025 at 10:33 AM
Breaking my social media silence because this news is too good not to share! 🎉
Just joined @milanlp.bsky.social as a Postdoc, working with the amazing @dirkhovy.bsky.social on large language models and computational social science!
July 29, 2025 at 12:07 PM
Reposted by Joachim Baumann
🎉 The @milanlp.bsky.social lab is excited to present 15 papers and 1 tutorial at #ACL2025 & workshops! Grateful to all our amazing collaborators, see everyone in Vienna! 🚀
July 16, 2025 at 12:11 PM