Lightnews — Scholar-powered news

Micah Carroll

@micahcarroll.bsky.social

76 followers 36 following 1 posts

PhD student @ berkeley. https://micahcarroll.github.io/

Posts Replies Media Videos

Micah Carroll

@micahcarroll.bsky.social

LLMs' sycophancy issues are a predictable result of optimizing for user feedback. Even if clear sycophantic behaviors get fixed, AIs' exploits of our cognitive biases may only become more subtle.

Grateful our research on this was featured in @washingtonpost.com by @nitasha.bsky.social!

nitasha tiku @nitasha.bsky.social · May 31

AI is speedrunning the social media era by optimizing chatbots for engagement, user feedback, time spent.

Evidence is mounting that this poses unintended risks, includ. chats from peer-reviewed research, OpenAI's "sycophancy" debacle, & Character ai lawsuits www.washingtonpost.com/technology/2...

Your chatbot friend might be messing with your mind

OpenAI, Meta and others want people to spend more time with AI chatbots, but there is growing evidence that they can hook users or reinforce harmful ideas.

www.washingtonpost.com

June 1, 2025 at 6:25 PM

Reposted by Micah Carroll

Cameron Jones

@camrobjones.bsky.social

How effective are LLMs are persuading and deceiving people? In a new preprint we review different theoretical risks of LLM persuasion; empirical work measuring how persuasive LLMs currently are; and proposals to mitigate these risks. 🧵

arxiv.org/abs/2412.17128

Lies, Damned Lies, and Distributional Language Statistics: Persuasion and Deception with Large Language Models

Large Language Models (LLMs) can generate content that is as persuasive as human-written text and appear capable of selectively producing deceptive outputs. These capabilities raise concerns about pot...

arxiv.org

January 10, 2025 at 1:59 PM

Reposted by Micah Carroll

Brad Knox

@bradknox.bsky.social

RLHF algorithms assume humans generate preferences according to normative models. We propose a new method for model alignment: influence humans to conform to these assumptions through interface design. Good news: it works!
#AI #MachineLearning #RLHF #Alignment (1/n)

First page of the paper Influencing Humans to Conform to Preference Models for RLHF, by Hatgis-Kessell et al.

Our proposed method of influencing human preferences.

January 14, 2025 at 11:51 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news