Lightnews — Scholar-powered news

Joachim Baumann

@joachimbaumann.bsky.social

160 followers 220 following 16 posts

Postdoc @milanlp.bsky.social / Incoming Postdoc @stanfordnlp.bsky.social / Computational social science, LLMs, algorithmic fairness

Posts Replies Media Videos

Joachim Baumann

@joachimbaumann.bsky.social

Thank you, Florian :) We use two methods, CDI and DSL. Both debias LLM annotations and reduce false positive conclusions to about 3-13%, on average, but at the cost of a much higher Type II risk (up to 92%). The human-only conclusions have a pretty low Type I risk as well, at a lower Type II risk.

September 14, 2025 at 6:55 AM

Joachim Baumann

@joachimbaumann.bsky.social

🚨 New paper alert 🚨 Using LLMs as data annotators, you can produce any scientific result you want. We call this **LLM Hacking**.

Paper: arxiv.org/pdf/2509.08825

$We present our new preprint titled "Large Language Model Hacking: Quantifying the Hidden Risks of Using LLMs for Text Annotation". We quantify LLM hacking risk through systematic replication of 37 diverse computational social science annotation tasks. For these tasks, we use a combined set of 2,361 realistic hypotheses that researchers might test using these annotations. Then, we collect 13 million LLM annotations across plausible LLM configurations. These annotations feed into 1.4 million regressions testing the hypotheses. For a hypothesis with no true effect (ground truth $p > 0.05$), different LLM configurations yield conflicting conclusions. Checkmarks indicate correct statistical conclusions matching ground truth; crosses indicate LLM hacking -- incorrect conclusions due to annotation errors. Across all experiments, LLM hacking occurs in 31-50\% of cases even with highly capable models. Since minor configuration changes can flip scientific conclusions, from correct to incorrect, LLM hacking can be exploited to present anything as statistically significant.$

September 12, 2025 at 10:33 AM

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news