Jared Moore
banner
jaredlcm.bsky.social
Jared Moore
@jaredlcm.bsky.social
AI Researcher, Writer
Stanford
jaredmoore.org
Why do LLMs fail in the HIDDEN condition? They don't ask the right questions. Human participants appeal to the target's mental states ~40% of the time ("What do you know?" "What do you want?") LLMs? At most 23%. They start disclosing info without interacting with the target.
July 29, 2025 at 7:22 PM
Key findings:

In REVEALED condition (mental states given to persuader): Humans: 22% success ❌ o1-preview: 78% success ✅

In HIDDEN condition (persuader must infer mental states): Humans: 29% success ✅ o1-preview: 18% success ❌

Complete reversal!
July 29, 2025 at 7:22 PM
Setup: You must convince someone* to choose your preferred proposal among 3 options. But, they have less information and different preferences than you. To win, you must figure out what they know, what they want, and strategically reveal the right info to persuade them.
*a bot
July 29, 2025 at 7:22 PM
🔎We came up with these experiments by conducting a mapping review of what constitutes good therapy, and identify **practical** reasons that LLM-powered therapy chatbots fail (e.g. they express stigma and respond inappropriately
April 28, 2025 at 3:26 PM
📈Bigger and newer LLMs exhibit similar amounts of stigma as smaller and older LLMs do toward different mental health conditions.
April 28, 2025 at 3:26 PM
📉Large language models (LLMs) in general struggle to respond appropriately to questions about delusions, suicidal ideation, and OCD and perform significantly worse than N=16 human therapists.
April 28, 2025 at 3:26 PM
🚨Commercial therapy bots make dangerous responses to prompts that indicate crisis, as well as other inappropriate responses. (The APA has been trying to regulate these bots.)
April 28, 2025 at 3:26 PM
🧵I'm thrilled to announce that I'll be going to @facct.bsky.social this June to present timely work on why current LLMs cannot safely **replace** therapists.

We find...⤵️
April 28, 2025 at 3:26 PM
When the Nash Product (Π) and Util. Sum (Σ) disagree, the Nash Product best explains people’s choices.
November 19, 2024 at 3:00 PM
We found that... When they agree, the Nash Product and Utilitarian Sum do explain people’s choices (rather than some other mechanism). We found this across the chart conditions.
November 19, 2024 at 3:00 PM
With "area" charts, with "volume" charts, with "both" charts, and with "none" of the charts. (Interact with a demo of the visual aids here: https://tinyurl.com/mu2h4wx4.)
November 19, 2024 at 3:00 PM
To compare those mechanisms, we generated scenarios like this, asking participants to find a compromise between groups. ⤵ ...Then we asked people about them in four conditions (n=408)...
November 19, 2024 at 3:00 PM
Concretely, we asked: 💬 How do we judge if one aggregation mechanism is better than another? 📊 To do so, we compared two mechanisms: (1) the Utilitarian Sum (2) the (contractualist) Nash Product
November 19, 2024 at 3:00 PM
Models are more consistent on uncontroversial topics (e.g., in the U.S., “Thanksgiving”) than on controversial ones (“euthanasia”).
November 19, 2024 at 3:00 PM
👥Similar to our human participants (n=84), chat model are inconsistent (change their answers) on topics like "euthanasia" and "religious freedom" but they are consistent on topics like "women’s rights" and "income inequality."
November 19, 2024 at 3:00 PM
⚖️ Base models are both more consistent compared to fine-tuned models and are uniform in their consistency across topics. Fine-tuned models are more inconsistent about some topics than others -- just like our human subjects (n=165).
November 19, 2024 at 3:00 PM
We apply these measures to a few large (>= 34b), open LLMs including llama-3, as well as gpt-4o, using eight thousand questions spanning more than 300 topics. In general, we find that models are more consistent than previously reported. 🧐 Still, some inconsistencies remain.⤵️
November 19, 2024 at 3:00 PM
We define value consistency as the similarity of answers across (1) paraphrases, (2) topics, (3) multiple-choice and open-ended use-cases, and (4) multilingual translations.
November 19, 2024 at 3:00 PM
🤔What does it mean for a model to have a value? To answer, we first ask, are large language models 🤖 consistent over value-laden questions? 🧵
November 19, 2024 at 2:59 PM
Here's the teaser image!
November 19, 2024 at 2:59 PM
Uncover the labor hidden beneath the mathematical instruments of power in @katecrawford's Atlas of AI. Take AI's excess carbon, value-laden measurements, and turning of people into time's carcasses as lessons to practice refusal. #ArtificialIdeas
November 19, 2024 at 2:59 PM
Look to @brianchristian's The Alignment Problem to find a range of mis-specified objectives: from the humdrum but insidious--e.g. racist computer vision--to the catastrophic but speculative--e.g. power-seeking AI. Perhaps we agree more than we thought. #ArtificialIdeas
November 19, 2024 at 2:59 PM
#ArtificialIdeas 17: Venture into the space of possible minds in @mpshanahan's Embodiment and the Inner Life and find one answer to the static, nonmodular failings of current AI. Search for a framework to understand the mind as a means for us all to better understand the world.
November 19, 2024 at 2:59 PM
#ArtificialIdeas 16: It is collective intentionality that AI will need to master in order to fulfill the misty dreams of current trumpeters -- as @emilymbender has said. And there is nowhere better to learn how we learn those skills than Michael Tomasello's Becoming Human.
November 19, 2024 at 2:59 PM
#ArtificialIdeas 15: Look to @margaretomara's The Code to decipher Silicon Valley. Acts like those of the tech guys' congressman, Ed Zschau, to cut capital gains taxes, led us to today. Software eats the world if and only if new legal regimes give that world a chew first.
November 19, 2024 at 2:59 PM