Lightnews — Scholar-powered news

Jacy Reese Anthis

@jacyanthis.bsky.social

Last school year, 19% of US high schoolers had or have a friend who had a “romantic relationship” with AI.

42% had or have a friend with an AI “friend/companion.”

42% had or have a friend who got “mental health support” from AI.

(Source: cdt.org/wp-content/u..., n = 1,030, June-Aug 2025, quotas.)

Table 3 from CDT Hand in Hand report, showing percentages of students who used AI in these various ways. Link: https://cdt.org/wp-content/uploads/2025/10/FINAL-CDT-2025-Hand-in-Hand-Polling-100225-accessible.pdf

October 11, 2025 at 10:50 PM

Jacy Reese Anthis

@jacyanthis.bsky.social

In our new paper, we discovered "The AI Double Standard": People judge all AIs for the harm done by one AI, more strongly than they judge humans.

First impressions will shape the future of human-AI interaction—for better or worse. Accepted at #CSCW2025. See you in Norway! dl.acm.org/doi/10.1145/...

2x2 of Study 1 and Study 2 (rows) with the AI conditions and the human conditions (columns), finding spillover in all but the Study 2 human conditions.

September 29, 2025 at 3:29 PM

Jacy Reese Anthis

@jacyanthis.bsky.social

We find low support for agency in ChatGPT, Claude, Gemini, etc. Agency support doesn't come for free with RLHF and often contradicts it.

We think the AI community needs a shift towards scalable, conceptually rich evals. HumanAgencyBench is an open-source scaffolding for this.

A full table of results for 20 evaluated LLM assistants across six dimensions. Full table of results with this data is in the appendix. Error bars are very tight, ~0.5%-2% on a 100% scale.

September 15, 2025 at 5:11 PM

Jacy Reese Anthis

@jacyanthis.bsky.social

We use the power of LLM social simulations (arxiv.org/abs/2504.02234) to generate tests, another LLM to validate tests, and an "LLM-as-a-judge" to evaluate subject model responses. This allows us to create an adaptive and scalable benchmark of a complex, nuanced alignment target.

The HumanAgencyBench pipeline for generating tests for each dimension, from simulation to validation to diversity sampling to the final 500-item test set.

September 15, 2025 at 5:11 PM

Jacy Reese Anthis

@jacyanthis.bsky.social

LLM agents are optimized for thumbs-up instant gratification. RLHF -> sycophancy

We propose human agency as a new alignment target in HumanAgencyBench, made possible by AI simulation/evals. We find e.g., Claude most supports agency but also most tries to steer user values 👇 arxiv.org/abs/2509.08494

The main figure from the HumanAgencyBench paper, showing five models across the six dimensions. The table of results in the appendix has this information too.

September 15, 2025 at 5:11 PM

Jacy Reese Anthis

@jacyanthis.bsky.social

Session concludes with current issues in human-centered NLP, e.g., how sociologists would be "horrified" at NLP methods. @davidjurgens.bsky.social asked the ~200-person audience how many know Cronbach's Alpha... 5 hands raised! Oof. Echoes my feelings when I see human subjects in NLP/ML. #ACL2025NLP

The speakers sit in a panel on stage with Diyi Yang as moderator. Naitian Zhou holds the mic.

July 30, 2025 at 8:25 AM

Jacy Reese Anthis

@jacyanthis.bsky.social

Finally, Muhammad Abdul-Mageed et al. built the Palm dataset with 17.5k instruction pairs in Arabic. They find significant limitations of current LLMs. Bigger LLMs perform better, but drop substantially in local context and issues. #ACL2025NLP aclanthology.org/2025.acl-lon...

Muhammad Abdul-Mageed presents a slide showing the many countries and contexts involved in Arabic language—and represented in the Palm dataset.

July 30, 2025 at 8:04 AM

Jacy Reese Anthis

@jacyanthis.bsky.social

@naitian.org @dbamman.bsky.social @ibleaman.bsky.social take on the interdisciplinary challenge of enumerating current issues in cultural NLP such as coarse national boundaries and proposing how we can use localization of meaning, interaction, etc. #ACL2025NLP aclanthology.org/2025.acl-lon...

July 30, 2025 at 7:54 AM

Jacy Reese Anthis

@jacyanthis.bsky.social

@elisabassignana.bsky.social collect Prolific data on LLM use, including volunteered prompts, across socioeconomic use! Maybe the hardest topic I've ever heard of being studied on Prolific: high SES, low tech use, and self-shared data. Very interesting... #ACL2025NLP aclanthology.org/2025.acl-lon...

July 30, 2025 at 7:41 AM

Jacy Reese Anthis

@jacyanthis.bsky.social

@angelinawang.bsky.social presents the "Fairness through Difference Awareness" benchmark. Fairness tests require no discrimination...

but the law supports many forms of discrimination! E.g., synagogues should hire Jewish rabbis. LLMs often get this wrong aclanthology.org/2025.acl-lon... #ACL2025NLP

Angelina Wang presents the benchmark with Jewish synagogue hiring as an example.

July 30, 2025 at 7:26 AM

Jacy Reese Anthis

@jacyanthis.bsky.social

Morality in AI is often oversimplified. @davidjurgens.bsky.social and @shivanikumar.bsky.social kick off the "Human-Centred NLP" orals #ACL2025NLP with UniMoral, a huge dataset of moral scenario ratings in 6 languages! They find LLMs fail to simulated human moral decisions. bsky.app/profile/shiv...

David Jurgens gives talk with slides of country map

July 30, 2025 at 7:14 AM

Jacy Reese Anthis

@jacyanthis.bsky.social

I'm at #ACL2025 for 2 papers w/ @kldivergence.bsky.social et al! Let's chat, e.g., scaling evals, simulations, and HCI to unique challenges of general-purpose AI.

Bias in Language Models: Beyond Trick Tests and Towards RUTEd Evaluation
🗓️ Mon 11–12:30

The Impossibility of Fair LLMs
🗓️ Tue 16–17:30

Jacy Reese Anthis stands in front of the ACL 2025 Vienna sign!

July 27, 2025 at 12:54 PM

Jacy Reese Anthis

@jacyanthis.bsky.social

@diyiyang.bsky.social and @sherrytswu.bsky.social kick off #ACL2025 with "Human-AI Collaboration: How AIs Augment Human Teammates," showing why and how we need centaur evaluations. Realistic evals take work, but reliance on easy, short, and simple LLM evals led us to this current evaluation crisis.

July 27, 2025 at 8:05 AM

Jacy Reese Anthis

@jacyanthis.bsky.social

Second, in "Bias in Language Models: Beyond Trick Tests and Towards RUTEd Evaluation," we run 10,000s trials to test if standard fairness metrics predict bias in long-form writing tasks (e.g., write a bedtime story). Across several robustness checks, the answer is a strong no!

July 25, 2025 at 7:38 AM

Jacy Reese Anthis

@jacyanthis.bsky.social

First, in "The Impossibility of Fair LLMs," we go through each mathematical fairness framework: group fairness, causal fairness, etc. In each case, fairness is intractable. The training data is just too massive and there are too many contexts (users, use cases, demographics, etc.).

July 25, 2025 at 7:38 AM

Jacy Reese Anthis

@jacyanthis.bsky.social

Do bias and fairness metrics work for general-purpose AI like LLMs? In 2 papers just published in #ACL2025, we argue: not yet, but deep qualitative studies of social context scaled with AI assistance can get there!

Theory: aclanthology.org/2025.acl-lon...
Empirics: aclanthology.org/2025.acl-lon...

July 25, 2025 at 7:37 AM

Jacy Reese Anthis

@jacyanthis.bsky.social

Laura Nelson is waking us up #IC2S2 with keynote hot takes! Computational social scientists rely on a "qualitative moment" but gloss over this crucial step of deciding what the model means—whether it fits the actual target like "cultural alignment." Can models be humanlike or better used as "alien"?

July 23, 2025 at 7:41 AM

Jacy Reese Anthis

@jacyanthis.bsky.social

Great workshop today on LLMs @ic2s2.bsky.social. My main update: There are more NLP tasks where good old-fashioned LMs (e.g., BERT) still outperform "modern" decoders (e.g., GPT-4) than I realized! LLM-as-a-judge work should test these more often. TY @eollion.bsky.social @emilienschultz.bsky.social

Étienne standing in front of their title slide on "LLM Power to the People"

July 21, 2025 at 10:24 AM

Jacy Reese Anthis

@jacyanthis.bsky.social

Flying to Sweden to present at #IC2S2 2025 on how we can make sense of AI! We need big-picture social theory about AI instead of shoehorning particular aspects into human-human social theory. We conducted 57 interviews and analyzed millions of traditional/social media posts to build this framework.

Method, Mind, and Morality: How People Make Sense of Artificial Intelligence. Diagrams show an arrow between a bridge and a plant, a hammer and a cool guy in sunglasses and a suit, and a turtle and a rabbit. Authors are Jacy Reese Anthis, Erik Brynjolfsson, and James Evans. Images are created by Midjourney.

July 19, 2025 at 4:37 PM

Jacy Reese Anthis

@jacyanthis.bsky.social

Can LLMs simulate human research subjects for psychology, economics, and other fields? I'm on the ✈️ to Vancouver 🇨🇦 for #ICML2025 to present our position paper arguing yes! We *need* AI simulations so humanity's social understanding can keep pace with technological acceleration.

July 15, 2025 at 1:11 PM

Jacy Reese Anthis

@jacyanthis.bsky.social

Back to NYC for the summer for a @msftresearch.bsky.social project modeling the predictability and human-likeness of AI errors!

I stand in front of the Microsoft office door, smiling. It's glass, and you can see a NYC strett with a bike and a red building in the background.

July 5, 2025 at 11:06 AM

Jacy Reese Anthis

@jacyanthis.bsky.social

#3 There are several great high-level sociotechnical AI frameworks. So we need meso-level research streams to keep up with AI tech!

E.g., I saw 3 frameworks in 24 hours!
- #ICLR2025 "coevolution" (2 red-eye flights!)
- #CHI2025 keynote
- "bidirectional alignment" ICLR+CHI event

The 5 HAIC ICLR 2025 in-person organizers pose on stage.

The keynote slide says "Why Socio Technical Approaches Matter."

Standing room only at the introduction of the CHI Bi-Align event.

April 28, 2025 at 8:58 AM

Jacy Reese Anthis

@jacyanthis.bsky.social

#2 AI evaluation is still in its infancy because the world is far more complex than it seems.

@gaganbansal.bsky.social shared how they evaluate Microsoft's agent when it tries to recruit its own humans, file FOIA, counter bot detection, etc. Fascinating work today at the HEAL workshop #CHI2025

Gagan Bansal presents a slide with agent evaluation challenges at the HEAL workshop.

April 26, 2025 at 5:52 AM

Jacy Reese Anthis

@jacyanthis.bsky.social

こんにちは (hello) from #CHI2025, the world's largest ever human-computer interaction conference!

We're living through AI takeoff. AI technology is rocket fuel, but interaction is humanity's flight path.

In this thread I'll share insights from the coming week in Japan 🤖✈️🌸⛩️

Jacy and Kelly stand in front of orange gates and stonework at Mt. Inari near Kyoto, Japan.

April 25, 2025 at 11:59 AM

Jacy Reese Anthis

@jacyanthis.bsky.social

We lay out five tractable challenges that must be overcome for widespread use: Diversity, Bias, Sycophancy 😁, Alienness 👾, and Generalization 🌏. Current LLM sims can be used for pilots and exploratory studies, and we should trial sims for replication and sensitivity analysis.

April 4, 2025 at 3:50 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news