Mingqian Zheng
mingqian-zheng.bsky.social
Mingqian Zheng
@mingqian-zheng.bsky.social
How and when should LLM guardrails be deployed to balance safety and user experience?

Our #EMNLP2025 paper reveals that crafting thoughtful refusals rather than detecting intent is the key to human-centered AI safety.

📄 arxiv.org/abs/2506.00195
🧵[1/9]
October 20, 2025 at 8:04 PM
Reposted by Mingqian Zheng
Why do some emails get a reply and not others? Does it have more to do with how you write it or who you are—or maybe both? In our new #NAACL2025 paper we looked at 11M emails to causally test what factors will help you get a reply. 📬
May 1, 2025 at 3:15 AM
Reposted by Mingqian Zheng
When interacting with ChatGPT, have you wondered if they would ever "lie" to you? We found that under pressure, LLMs often choose deception. Our new #NAACL2025 paper, "AI-LIEDAR ," reveals models were truthful less than 50% of the time when faced with utility-truthfulness conflicts! 🤯 1/
April 28, 2025 at 8:36 PM
Reposted by Mingqian Zheng
Did you know? Gestures used to express universal concepts—like wishing for luck—vary DRAMATICALLY across cultures?
🤞means luck in US but deeply offensive in Vietnam 🚨

📣 We introduce MC-SIGNS, a test bed to evaluate how LLMs/VLMs/T2I handle such nonverbal behavior!

📜: arxiv.org/abs/2502.17710
February 26, 2025 at 4:23 PM