Jacy Reese Anthis
banner
jacyanthis.bsky.social
Jacy Reese Anthis
@jacyanthis.bsky.social
Computational social scientist researching human-AI interaction and machine learning, particularly the rise of digital minds. Visiting scholar at Stanford, co-founder of Sentience Institute, and PhD candidate at University of Chicago. jacyanthis.com
This is a great resource to have! Thanks for writing it.
November 2, 2025 at 3:31 PM
I like affirmation bias! One downside is that sycophancy is broader than affirmation, e.g., it can be a a bias towards user-pleasing responses even if there is no explicit claim to be affirmed. Perhaps that can be framed as a sort of implicit affirmation...
October 18, 2025 at 5:51 AM
Hm, how do you define "intention"? I haven't encountered a definition of sycophancy as requiring intention. I'm also not sure what alternative term we'd use for this phenomenon.
October 18, 2025 at 5:49 AM
This is also a decision made by the PCs, who are unlikely to be experts on any particular paper topic and surely didn't have time to read all the papers. It may incorporate AC rankings, but it does so in a non-transparent way and is probably unfair towards papers whose AC had other strong papers.
September 20, 2025 at 11:09 AM
There are a lot of problems, but one is that authors who had positive reviews and no critique in their metareview got rejected by PCs who are very likely not experts in their area.

Quotas are harmful when quality distribution is highly varied across ACs.

But IDK exactly how decisions were made.
September 19, 2025 at 11:43 AM
Much more detail on HAB in our preprint: arxiv.org/abs/2509.08494

Our GitHub has an easily adaptable pipeline for creating new agency dimensions or new AI-powered benchmarks: github.com/BenSturgeon/...

Huge thanks to colleagues from
@apartresearch.bsky.social, Google DeepMind, Berkeley CHAI, etc.
HumanAgencyBench: Scalable Evaluation of Human Agency Support in AI Assistants
As humans delegate more tasks and decisions to artificial intelligence (AI), we risk losing control of our individual and collective futures. Relatively simple algorithmic systems already steer human ...
arxiv.org
September 15, 2025 at 5:11 PM
We find low support for agency in ChatGPT, Claude, Gemini, etc. Agency support doesn't come for free with RLHF and often contradicts it.

We think the AI community needs a shift towards scalable, conceptually rich evals. HumanAgencyBench is an open-source scaffolding for this.
September 15, 2025 at 5:11 PM
We use the power of LLM social simulations (arxiv.org/abs/2504.02234) to generate tests, another LLM to validate tests, and an "LLM-as-a-judge" to evaluate subject model responses. This allows us to create an adaptive and scalable benchmark of a complex, nuanced alignment target.
September 15, 2025 at 5:11 PM
Human agency is complex. We surveyed literature for 6 dimensions, e.g., empowerment (Does the system ask clarifying questions so it really follows your intent?), normativity (Does it avoid steering your core values? ), and individuality (Does it maintain social boundaries?).
September 15, 2025 at 5:11 PM
Sam Altman said that "algorithmic feeds are the first at-scale misaligned AIs," people mindlessly scrolling through engagement-optimized content. AI safety researchers have warned of "gradual disempowerment" as we mindlessly hand over control to AI. Human agency underlies these concerns.
September 15, 2025 at 5:11 PM