Joachim Baumann
@joachimbaumann.bsky.social
Postdoc @milanlp.bsky.social / Incoming Postdoc @stanfordnlp.bsky.social / Computational social science, LLMs, algorithmic fairness
Thank you, Florian :) We use two methods, CDI and DSL. Both debias LLM annotations and reduce false positive conclusions to about 3-13%, on average, but at the cost of a much higher Type II risk (up to 92%). The human-only conclusions have a pretty low Type I risk as well, at a lower Type II risk.
September 14, 2025 at 6:55 AM
Thank you, Florian :) We use two methods, CDI and DSL. Both debias LLM annotations and reduce false positive conclusions to about 3-13%, on average, but at the cost of a much higher Type II risk (up to 92%). The human-only conclusions have a pretty low Type I risk as well, at a lower Type II risk.
🚨 New paper alert 🚨 Using LLMs as data annotators, you can produce any scientific result you want. We call this **LLM Hacking**.
Paper: arxiv.org/pdf/2509.08825
Paper: arxiv.org/pdf/2509.08825
September 12, 2025 at 10:33 AM
🚨 New paper alert 🚨 Using LLMs as data annotators, you can produce any scientific result you want. We call this **LLM Hacking**.
Paper: arxiv.org/pdf/2509.08825
Paper: arxiv.org/pdf/2509.08825