Myra Cheng
myra.bsky.social
Myra Cheng
@myra.bsky.social
PhD candidate @ Stanford NLP
https://myracheng.github.io/
Was a blast working on this with @cinoolee.bsky.social @pranavkhadpe.bsky.social, Sunny Yu, Dyllan Han, and @jurafsky.bsky.social !!! So lucky to work with this wonderful interdisciplinary team!!💖✨
October 3, 2025 at 10:58 PM
While our work focuses on interpersonal advice-seeking, concurrent work by @steverathje.bsky.social @jayvanbavel.bsky.social
et al. finds similar patterns for political topics, where sycophantic AI also led to more extreme attitudes when users discussed gun control, healthcare, immigration, etc.!
October 3, 2025 at 10:57 PM
There is currently little incentive for developers to reduce sycophancy. Our work is a call to action: we need to learn from the social media era and actively consider long-term wellbeing in AI development and deployment. Read our preprint: arxiv.org/pdf/2510.01395
arxiv.org
October 3, 2025 at 10:57 PM
Despite sycophantic AI’s reduction of prosocial intentions, people also preferred it and trusted it more. This reveals a tension: AI is rewarded for telling us what we want to hear (immediate user satisfaction), even when it may harm our relationships.
October 3, 2025 at 10:57 PM
Next, we tested the effects of sycophancy. We find that even a single interaction with sycophantic AI increased users’ conviction that they were right and reduced their willingness to apologize. This held both in controlled, hypothetical vignettes and live conversations about real conflicts.
October 3, 2025 at 10:55 PM
We focus on the prevalence and harms of one dimension of sycophancy: AI models endorsing users’ behaviors. Across 11 AI models, AI affirms users’ actions about 50% more than humans do, including when users describe harmful behaviors like deception or manipulation.
October 3, 2025 at 10:53 PM
Congrats Maria!! All the best!!
August 4, 2025 at 2:58 PM
Aw thanks!! :)
June 28, 2025 at 6:19 PM
Paper: arxiv.org/pdf/2502.13259
Code: github.com/myracheng/hu...
Thanks to my wonderful collaborators Sunny Yu and @jurafsky.bsky.social and everyone who helped along the way!!
arxiv.org
June 12, 2025 at 12:10 AM
So we built DumT, a method using DPO + HumT to steer models to be less human-like without hurting performance. Annotators preferred DumT outputs for being: 1) more informative and less wordy (no extra “Happy to help!”) 2) less deceptive and more authentic to LLMs’ capabilities.
June 12, 2025 at 12:09 AM
We also develop metrics for implicit social perceptions in language, and find that human-like LLM outputs correlate with perceptions linked to harms: warmth and closeness (→ overreliance), and low status and femininity (→ harmful stereotypes).
June 12, 2025 at 12:08 AM
First, we introduce HumT (Human-like Tone), a metric for how human-like a text is, based on relative LM probabilities. Measuring HumT across 5 preference datasets, we find that preferred outputs are consistently less human-like.
June 12, 2025 at 12:08 AM
thanks!! looking forward to seeing your submission as well :D
May 22, 2025 at 2:57 AM
thanks Rob!!
May 22, 2025 at 2:56 AM
We also apply ELEPHANT to identify sources of sycophancy (in preference datasets) and explore mitigations. Our work enables measuring social sycophancy to prevent harms before they happen.
Preprint: arxiv.org/abs/2505.13995
Code: github.com/myracheng/el...
GitHub - myracheng/elephant
Contribute to myracheng/elephant development by creating an account on GitHub.
github.com
May 21, 2025 at 6:26 PM