Xuhui Zhou
banner
nlpxuhui.bsky.social
Xuhui Zhou
@nlpxuhui.bsky.social
PhD student @ltiatcmu.bsky.social. Previously, @ai2.bsky.social, @uwnlp.bsky.social, @appleinc.bsky.social, @ucberkeleyofficial.bsky.social; Social Intelligence in language +X. He/Him.🐳
🔄 Multi-turn interactive setup is crucial - models often begin with equivocation but shift to falsification when pressed for clear answers 🧠 Stronger models like GPT-4o showed the greatest shift when prompted to deceive (40% increase in falsification; alarming) 6/
April 28, 2025 at 8:36 PM
⚠️ Even when explicitly instructed to be truthful, models STILL lied - GPT-4o still falsified info 15% of the time! 📉 The tradeoff is real: more honest models completed their goals 15% less often 5/
April 28, 2025 at 8:36 PM
💼 In business scenarios (selling defective products), models were either completely honest OR completely deceptive 🌐 In public image scenarios (reputation management), behaviors were more ambiguous and complex 4/
April 28, 2025 at 8:36 PM
And what we found: 📊 ALL tested models (GPT-4o, LLaMA-3, Mixtral) were truthful less than 50% of the time in conflict scenarios 🤔 Models prefer "partial lies" like equivocation over outright falsification - they'll dodge questions before explicitly lying 3/
April 28, 2025 at 8:36 PM
Obviously this is a pressing issue now: x.com/deedydas/sta...; x.com/DanHendrycks... And here, we put LLMs into a multi-turn dialogue environment mimic the realistic setting where users constantly try to seek info from LLMs 2/
April 28, 2025 at 8:36 PM
When interacting with ChatGPT, have you wondered if they would ever "lie" to you? We found that under pressure, LLMs often choose deception. Our new #NAACL2025 paper, "AI-LIEDAR ," reveals models were truthful less than 50% of the time when faced with utility-truthfulness conflicts! 🤯 1/
April 28, 2025 at 8:36 PM
8/ Not all LLMs ask the right questions. ❓🤖
🔹 Llama 3.1 70B asks generic, low-impact questions.
🔹 Claude Haiku 3.5 picks up keywords directly from the input to ask questions.
🔹 Claude Sonnet 3.5 often explores the code first, leading to smarter interactions. 🔍💡
February 19, 2025 at 7:46 PM
7/ Claude models ask fewer but smarter questions, extracting more info and boosting performance. 📈
Meanwhile, DeepSeek-V2 can overwhelm users with too many questions. 🤯
February 19, 2025 at 7:46 PM
6/ Without compulsory interaction, LLMs struggle to distinguish clear vs. vague instructions, either over-interacting or under-interacting despite prompt tweaks. 🔄
Only Claude Sonnet 3.5 can make this distinction to a limited degree with the right prompt. 🔍
February 19, 2025 at 7:46 PM
5/ Our findings? LLMs default to non-interactive behavior unless forced to interact. But when they clarify vague inputs, performance drastically improves—proving the power of effective communication. 💬🤝
February 19, 2025 at 7:46 PM
3/ How much does interaction actually help LLMs in coding tasks? 🤖💡
We put them to the test on SWE-Bench Verified across three distinct settings to measure the impact. 📊
February 19, 2025 at 7:46 PM
LLM agents can code—but can they ask clarifying questions? 🤖💬
Tired of coding agents wasting time and API credits, only to output broken code? What if they asked first instead of guessing? 🚀

(New work led by Sanidhya Vijay: www.linkedin.com/in/sanidhya-...)
February 19, 2025 at 7:46 PM