Lightnews — Scholar-powered news

Jacy Reese Anthis

@jacyanthis.bsky.social

720 followers 110 following 110 posts

Computational social scientist researching human-AI interaction and machine learning, particularly the rise of digital minds. Visiting scholar at Stanford, co-founder of Sentience Institute, and PhD candidate at University of Chicago. jacyanthis.com

Posts Replies Media Videos

Jacy Reese Anthis

@jacyanthis.bsky.social

This is a great resource to have! Thanks for writing it.

November 2, 2025 at 3:31 PM

Jacy Reese Anthis

@jacyanthis.bsky.social

I like affirmation bias! One downside is that sycophancy is broader than affirmation, e.g., it can be a a bias towards user-pleasing responses even if there is no explicit claim to be affirmed. Perhaps that can be framed as a sort of implicit affirmation...

October 18, 2025 at 5:51 AM

Jacy Reese Anthis

@jacyanthis.bsky.social

Hm, how do you define "intention"? I haven't encountered a definition of sycophancy as requiring intention. I'm also not sure what alternative term we'd use for this phenomenon.

October 18, 2025 at 5:49 AM

Jacy Reese Anthis

@jacyanthis.bsky.social

This is also a decision made by the PCs, who are unlikely to be experts on any particular paper topic and surely didn't have time to read all the papers. It may incorporate AC rankings, but it does so in a non-transparent way and is probably unfair towards papers whose AC had other strong papers.

September 20, 2025 at 11:09 AM

Jacy Reese Anthis

@jacyanthis.bsky.social

There are a lot of problems, but one is that authors who had positive reviews and no critique in their metareview got rejected by PCs who are very likely not experts in their area.

Quotas are harmful when quality distribution is highly varied across ACs.

But IDK exactly how decisions were made.

September 19, 2025 at 11:43 AM

Jacy Reese Anthis

@jacyanthis.bsky.social

Tagging authors we cite and build on: @scasper.bsky.social @kulveit.bsky.social @kashhill.bsky.social @saffron.bsky.social @lujain.bsky.social @atoosakz.bsky.social @amandaaskell.bsky.social @jackclarksf.bsky.social @petersalib.bsky.social @mpshanahan.bsky.social @subramonyam.bsky.social

September 15, 2025 at 5:11 PM

Jacy Reese Anthis

@jacyanthis.bsky.social

Much more detail on HAB in our preprint: arxiv.org/abs/2509.08494

Our GitHub has an easily adaptable pipeline for creating new agency dimensions or new AI-powered benchmarks: github.com/BenSturgeon/...

Huge thanks to colleagues from
@apartresearch.bsky.social, Google DeepMind, Berkeley CHAI, etc.

HumanAgencyBench: Scalable Evaluation of Human Agency Support in AI Assistants

As humans delegate more tasks and decisions to artificial intelligence (AI), we risk losing control of our individual and collective futures. Relatively simple algorithmic systems already steer human ...

arxiv.org

September 15, 2025 at 5:11 PM

Jacy Reese Anthis

@jacyanthis.bsky.social

We find low support for agency in ChatGPT, Claude, Gemini, etc. Agency support doesn't come for free with RLHF and often contradicts it.

We think the AI community needs a shift towards scalable, conceptually rich evals. HumanAgencyBench is an open-source scaffolding for this.

A full table of results for 20 evaluated LLM assistants across six dimensions. Full table of results with this data is in the appendix. Error bars are very tight, ~0.5%-2% on a 100% scale.

September 15, 2025 at 5:11 PM

Jacy Reese Anthis

@jacyanthis.bsky.social

We use the power of LLM social simulations (arxiv.org/abs/2504.02234) to generate tests, another LLM to validate tests, and an "LLM-as-a-judge" to evaluate subject model responses. This allows us to create an adaptive and scalable benchmark of a complex, nuanced alignment target.

The HumanAgencyBench pipeline for generating tests for each dimension, from simulation to validation to diversity sampling to the final 500-item test set.

September 15, 2025 at 5:11 PM

Jacy Reese Anthis

@jacyanthis.bsky.social

Human agency is complex. We surveyed literature for 6 dimensions, e.g., empowerment (Does the system ask clarifying questions so it really follows your intent?), normativity (Does it avoid steering your core values? ), and individuality (Does it maintain social boundaries?).

September 15, 2025 at 5:11 PM

Jacy Reese Anthis

@jacyanthis.bsky.social

Sam Altman said that "algorithmic feeds are the first at-scale misaligned AIs," people mindlessly scrolling through engagement-optimized content. AI safety researchers have warned of "gradual disempowerment" as we mindlessly hand over control to AI. Human agency underlies these concerns.

September 15, 2025 at 5:11 PM

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news