Katie Keith
banner
katakeith.bsky.social
Katie Keith
@katakeith.bsky.social
NLP and computational social science (CSS) researcher. Assistant Professor in Computer Science at Williams College. AI2 and UMass Amherst alum. she/her. https://kakeith.github.io/
Whoa...!! If social-science leaning at all maybe try other preprint servers? SocArXiv for example? We put one of our preprints there: osf.io/preprints/so...
August 27, 2025 at 7:02 PM
Yes! I agree. It's so rare these days to see a keynote that is so thorough and full of new conceptualizations.
August 12, 2025 at 2:12 AM
Under review! Happy to share a draft if you email me. Thanks!
July 23, 2025 at 7:14 PM
Thanks:)
July 23, 2025 at 2:39 PM
Not as recent, but still LLM-based

"WANLI: Worker and AI Collaboration for Natural Language Inference Dataset Creation." GPT-3 composes new examples with similar patterns to challenging examples.

aclanthology.org/2022.finding...
aclanthology.org
July 23, 2025 at 1:05 PM
I thought this was a clever and useful paper from Xiong, ... Hovy, El-Assady, Ash "Co-DETECT: Collaborative Discovery of Edge Cases in Text Classification." Using LLMs to help humans refine their codebooks (before codebooks are fixed for the true annotation stage) arxiv.org/pdf/2507.05010
arxiv.org
July 23, 2025 at 1:00 PM
We used active learning to create a human-annotated dataset of 1050 instances from FOMC transcripts—labeled for FOMC members’ opinions and directional stance towards monetary policy. Preprint and dataset should be released publicly by the end of the summer but email me for an advanced copy.
July 23, 2025 at 12:53 PM
Yay! I'm there as well. Let's sync up.
July 20, 2025 at 11:31 AM
Personally, I find I have to burn a day answering all the questions (particularly for a dataset release). I think it should be condensed to the 5 most important ones.
May 20, 2025 at 6:27 PM
Our semi-synthetic experiments use MIIMIC-III clinical notes and two open-weight LLMs and show that our method produces estimates with low bias.
December 11, 2024 at 1:10 AM
For settings with an unobserved (but known) confounding variable, we propose a new causal inference method that uses two instances of pre-treatment text data, infers two proxies using two zero-shot models on the separate instances, and applies these proxies in the proximal g-formula.
December 11, 2024 at 1:10 AM