Lightnews — Scholar-powered news

Kristen Syme

@kristensyme.bsky.social

380 followers 880 following 23 posts

Research Fellow, University of Leicester, Psychology, Biocultural/Evolutionary Anthropology, kls52@leicester.ac.uk

Posts Replies Media Videos

Kristen Syme

@kristensyme.bsky.social

Scrutinizing a sample of disagreements, we found that GPT provided the most accurate responses for 6 vars. For 3 vars, including gender, a 'majority rule' between the two humans and GPT provided the most accurate response. GPT overcoded 3 vars.

April 18, 2025 at 7:01 PM

Kristen Syme

@kristensyme.bsky.social

Running GPT across the unresolved 1,015 texts, the human coders had only slightly higher agreement and reliability with each other compared to GPT with each human. Although Gwet's ac1 was high for all vars, Cohen's kappa was much too low on 4, indicating high agreement on 0s but low agreement on 1s.

April 18, 2025 at 6:54 PM

Kristen Syme

@kristensyme.bsky.social

We had GPT help us come to the 'correct' answers by having it provide a rationale for its response. In some cases GPT was wrong, but in other cases, it helped us identify cases that that we humans mis-coded, as these examples with religiousity demonstrate.

April 18, 2025 at 6:48 PM

Kristen Syme

@kristensyme.bsky.social

GPT performed nearly as well as humans on identifying whether the gender of the faster was a man or a woman, though it performed somewhat worse for identifying 'Both' due to greater ambiguity in e.g., both vs. unknown.

April 18, 2025 at 6:43 PM

Kristen Syme

@kristensyme.bsky.social

Measuring precision, recall, and f1, we found that GPT annotated as well or better than humans on 5 vars, tended to overcode (i.e., high recall) on 3 vars (tho it performs well on Visions and Knowledge overall), and performed poorly on 2 vars.

April 18, 2025 at 6:40 PM

Kristen Syme

@kristensyme.bsky.social

Starting with a sample of 225 texts that we came to a consensus on, we found high % agreement and inter-rater reliability between GPT 4.0 and the human annotators. We also found over 90% agreement of GPT with itself on two rounds.

April 18, 2025 at 6:36 PM

Kristen Syme

@kristensyme.bsky.social

My co-author Caity Placek and I manually coded 1,240 paragraphs on ritual fasting from the HRAF and tested GPT 4.0's ability to annotate the texts and help us resolve discrepancies on 12 variables incl. gender of fasters, cognitive outcomes, and health and social outcomes.

April 18, 2025 at 6:33 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news