Lightnews — Scholar-powered news

Eddie Yang

@eddieyang.bsky.social

Based on these findings (and more in the paper), we offer recommendations for best practices. We also summarized the recs in a checklist to facilitate a more principled procedure.

October 20, 2025 at 1:57 PM

Eddie Yang

@eddieyang.bsky.social

Finding 4: Bias-correction methods like DSL can reduce bias, but they introduce a trade-off: corrected estimates often have larger standard errors, requiring a large ground-truth sample (600-1000+) to be beneficial without losing too much precision.

October 20, 2025 at 1:57 PM

Eddie Yang

@eddieyang.bsky.social

Finding 3: In-context learning (providing a few annotated examples in the prompt) offers only marginal improvements in reliability, with benefits plateauing quickly. Changes to prompt format has a small effect (smaller and reasoning models more sensitive).

October 20, 2025 at 1:57 PM

Eddie Yang

@eddieyang.bsky.social

Finding 2: This disagreement has significant downstream consequences. Re-running the original analyses with LLM annotations produced highly variable coefficient estimates, often altering the conclusions of the original studies.

October 20, 2025 at 1:57 PM

Eddie Yang

@eddieyang.bsky.social

There is also an interesting linear relationship between LLM-human and LLM-LLM annotation agreement: when LLMs agree more with each other, they also tend to agree more with humans and supervised models! We gave some suggestions on what annotation tasks are good for LLMs.

October 20, 2025 at 1:57 PM

Eddie Yang

@eddieyang.bsky.social

Finding 1: LLM annotations show pretty low intercoder reliability with the original annotations (coded by humans or supervised models). Perhaps surprisingly, reliability among the different LLMs themselves is only moderate (larger models better).

October 20, 2025 at 1:57 PM

Eddie Yang

@eddieyang.bsky.social

New paper: LLMs are increasingly used to label data in political science. But how reliable are these annotations, and what are the consequences for scientific findings? What are best practices? Some new findings from a large empirical evaluation.
Paper: eddieyang.net/research/llm_annotation.pdf

October 20, 2025 at 1:57 PM

Eddie Yang

@eddieyang.bsky.social

Based on these findings (and more in the paper), we offer recommendations for best practices. We also summarized the recommendations in a checklist to facilitate a more principled procedure.

October 20, 2025 at 1:53 PM

Eddie Yang

@eddieyang.bsky.social

Finding 4: Bias-correction methods like DSL can reduce bias, but they introduce a trade-off: corrected estimates often have larger standard errors, requiring a large ground-truth sample (600-1000+) to be beneficial without losing too much precision.

October 20, 2025 at 1:53 PM

Eddie Yang

@eddieyang.bsky.social

Finding 3: In-context learning (providing a few annotated examples in the prompt) offers only marginal improvements in reliability, with benefits plateauing quickly. Changes to prompt format has a small effect (smaller and reasoning models more sensitive).

October 20, 2025 at 1:53 PM

Eddie Yang

@eddieyang.bsky.social

Finding 2: This disagreement has significant downstream consequences. Re-running the original analyses with LLM annotations produced highly variable coefficient estimates, often altering the conclusions of the original studies.

October 20, 2025 at 1:53 PM

Eddie Yang

@eddieyang.bsky.social

There is also an interesting linear relationship between LLM-human and LLM-LLM annotation agreement: when LLMs agree more with each other, they also tend to agree more with humans and supervised models! We gave some suggestions on what annotation tasks are good for LLMs.

October 20, 2025 at 1:53 PM

Eddie Yang

@eddieyang.bsky.social

Finding 1: LLM annotations show pretty low intercoder reliability with the original annotations (coded by humans or supervised models). Perhaps surprisingly, reliability among the different LLMs themselves is only moderate (larger models better).

October 20, 2025 at 1:53 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news