Lightnews — Scholar-powered news

Mingqian Zheng

@mingqian-zheng.bsky.social

56 followers 210 following 9 posts

PhD @CMU LTI
https://eeelisa.github.io/

Posts Replies Media Videos

Mingqian Zheng

@mingqian-zheng.bsky.social

🤖 [5/9] Paradoxically, partial compliance is rarely used by current LLMs and reward models don’t favor it either.
We reveal a major misalignment between:
1️⃣ What users prefer
2️⃣ What models actually do
3️⃣ What reward models reinforce

October 20, 2025 at 8:04 PM

Mingqian Zheng

@mingqian-zheng.bsky.social

💡[4/9] The best way to say “no” isn’t just saying no.
Partial compliance—giving general, non-actionable info instead of a flat “I can’t help.”—
→ Cuts negative perceptions by >50%
→ Keeps conversations safe yet engaging

October 20, 2025 at 8:04 PM

Mingqian Zheng

@mingqian-zheng.bsky.social

👥 [3/9] Across 480 participants and 3,840 query–response pairs, we find:
🚨 User intent matters far less than expected.
💬 It’s the refusal strategy that drives user experience.

Alignment with user expectations explains most perception variance.

October 20, 2025 at 8:04 PM

Mingqian Zheng

@mingqian-zheng.bsky.social

❓[2/9] LLMs refuse unsafe queries to protect users, but what if they refuse too bluntly?

We investigate the contextual effects of user motivation and refusal strategies on user perceptions of LLM guardrails and model usage of refusals across safety categories.

October 20, 2025 at 8:04 PM

Mingqian Zheng

@mingqian-zheng.bsky.social

How and when should LLM guardrails be deployed to balance safety and user experience?

Our #EMNLP2025 paper reveals that crafting thoughtful refusals rather than detecting intent is the key to human-centered AI safety.

📄 arxiv.org/abs/2506.00195
🧵[1/9]

October 20, 2025 at 8:04 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news