Kristen Syme
banner
kristensyme.bsky.social
Kristen Syme
@kristensyme.bsky.social
Research Fellow, University of Leicester, Psychology, Biocultural/Evolutionary Anthropology, kls52@leicester.ac.uk
Scrutinizing a sample of disagreements, we found that GPT provided the most accurate responses for 6 vars. For 3 vars, including gender, a 'majority rule' between the two humans and GPT provided the most accurate response. GPT overcoded 3 vars.
April 18, 2025 at 7:01 PM
Running GPT across the unresolved 1,015 texts, the human coders had only slightly higher agreement and reliability with each other compared to GPT with each human. Although Gwet's ac1 was high for all vars, Cohen's kappa was much too low on 4, indicating high agreement on 0s but low agreement on 1s.
April 18, 2025 at 6:54 PM
We had GPT help us come to the 'correct' answers by having it provide a rationale for its response. In some cases GPT was wrong, but in other cases, it helped us identify cases that that we humans mis-coded, as these examples with religiousity demonstrate.
April 18, 2025 at 6:48 PM
GPT performed nearly as well as humans on identifying whether the gender of the faster was a man or a woman, though it performed somewhat worse for identifying 'Both' due to greater ambiguity in e.g., both vs. unknown.
April 18, 2025 at 6:43 PM
Measuring precision, recall, and f1, we found that GPT annotated as well or better than humans on 5 vars, tended to overcode (i.e., high recall) on 3 vars (tho it performs well on Visions and Knowledge overall), and performed poorly on 2 vars.
April 18, 2025 at 6:40 PM
Starting with a sample of 225 texts that we came to a consensus on, we found high % agreement and inter-rater reliability between GPT 4.0 and the human annotators. We also found over 90% agreement of GPT with itself on two rounds.
April 18, 2025 at 6:36 PM
My co-author Caity Placek and I manually coded 1,240 paragraphs on ritual fasting from the HRAF and tested GPT 4.0's ability to annotate the texts and help us resolve discrepancies on 12 variables incl. gender of fasters, cognitive outcomes, and health and social outcomes.
April 18, 2025 at 6:33 PM