Xuhui Zhou
@nlpxuhui.bsky.social
PhD student @ltiatcmu.bsky.social. Previously, @ai2.bsky.social, @uwnlp.bsky.social, @appleinc.bsky.social, @ucberkeleyofficial.bsky.social; Social Intelligence in language +X. He/Him.🐳
Reposted by Xuhui Zhou
New research from LTI, UMich, & Allen Institute for AI: LLMs don’t just hallucinate – sometimes, they lie. When truthfulness clashes with utility (pleasing users, boosting brands), models often mislead. @nlpxuhui.bsky.social and @maartensap.bsky.social discuss the paper:
lti.cmu.edu/news-and-eve...
lti.cmu.edu/news-and-eve...
Does Your Chatbot Swear to Tell the Truth? - Language Technologies Institute - School of Computer Science - Carnegie Mellon University
New research finds that LLM-based agents can't always be trusted to be truthful
lti.cmu.edu
June 26, 2025 at 7:21 PM
New research from LTI, UMich, & Allen Institute for AI: LLMs don’t just hallucinate – sometimes, they lie. When truthfulness clashes with utility (pleasing users, boosting brands), models often mislead. @nlpxuhui.bsky.social and @maartensap.bsky.social discuss the paper:
lti.cmu.edu/news-and-eve...
lti.cmu.edu/news-and-eve...
When interacting with ChatGPT, have you wondered if they would ever "lie" to you? We found that under pressure, LLMs often choose deception. Our new #NAACL2025 paper, "AI-LIEDAR ," reveals models were truthful less than 50% of the time when faced with utility-truthfulness conflicts! 🤯 1/
April 28, 2025 at 8:36 PM
When interacting with ChatGPT, have you wondered if they would ever "lie" to you? We found that under pressure, LLMs often choose deception. Our new #NAACL2025 paper, "AI-LIEDAR ," reveals models were truthful less than 50% of the time when faced with utility-truthfulness conflicts! 🤯 1/
Reposted by Xuhui Zhou
Reward models for LMs are meant to align outputs with human preferences—but do they accidentally encode dialect biases? 🤔
Excited to share our paper on biases against African American Language in reward models, accepted to #NAACL2025 Findings! 🎉
Paper: arxiv.org/abs/2502.12858 (1/10)
Excited to share our paper on biases against African American Language in reward models, accepted to #NAACL2025 Findings! 🎉
Paper: arxiv.org/abs/2502.12858 (1/10)
March 6, 2025 at 7:49 PM
Reward models for LMs are meant to align outputs with human preferences—but do they accidentally encode dialect biases? 🤔
Excited to share our paper on biases against African American Language in reward models, accepted to #NAACL2025 Findings! 🎉
Paper: arxiv.org/abs/2502.12858 (1/10)
Excited to share our paper on biases against African American Language in reward models, accepted to #NAACL2025 Findings! 🎉
Paper: arxiv.org/abs/2502.12858 (1/10)
Reposted by Xuhui Zhou
We are getting closer to have agents operating in the real physical world. However, can we trust frontier models to make embodied decisions 🎮 aligned with human norms 👩⚖️ ?
With EgoNormia, a 1.8k ego-centric video 🥽 QA benchmark, we show that this is surprisingly challenging!
With EgoNormia, a 1.8k ego-centric video 🥽 QA benchmark, we show that this is surprisingly challenging!
March 4, 2025 at 4:32 AM
We are getting closer to have agents operating in the real physical world. However, can we trust frontier models to make embodied decisions 🎮 aligned with human norms 👩⚖️ ?
With EgoNormia, a 1.8k ego-centric video 🥽 QA benchmark, we show that this is surprisingly challenging!
With EgoNormia, a 1.8k ego-centric video 🥽 QA benchmark, we show that this is surprisingly challenging!
LLM agents can code—but can they ask clarifying questions? 🤖💬
Tired of coding agents wasting time and API credits, only to output broken code? What if they asked first instead of guessing? 🚀
(New work led by Sanidhya Vijay: www.linkedin.com/in/sanidhya-...)
Tired of coding agents wasting time and API credits, only to output broken code? What if they asked first instead of guessing? 🚀
(New work led by Sanidhya Vijay: www.linkedin.com/in/sanidhya-...)
February 19, 2025 at 7:46 PM
LLM agents can code—but can they ask clarifying questions? 🤖💬
Tired of coding agents wasting time and API credits, only to output broken code? What if they asked first instead of guessing? 🚀
(New work led by Sanidhya Vijay: www.linkedin.com/in/sanidhya-...)
Tired of coding agents wasting time and API credits, only to output broken code? What if they asked first instead of guessing? 🚀
(New work led by Sanidhya Vijay: www.linkedin.com/in/sanidhya-...)
Excited to share that I'm joining All Hands AI (www.all-hands.dev) this summer as a research intern! 🚀
AI agents are becoming incredibly powerful, but their true potential lies in how they interact with and assist humans in meaningful ways.
AI agents are becoming incredibly powerful, but their true potential lies in how they interact with and assist humans in meaningful ways.
All Hands AI
www.all-hands.dev
February 6, 2025 at 4:27 PM
Excited to share that I'm joining All Hands AI (www.all-hands.dev) this summer as a research intern! 🚀
AI agents are becoming incredibly powerful, but their true potential lies in how they interact with and assist humans in meaningful ways.
AI agents are becoming incredibly powerful, but their true potential lies in how they interact with and assist humans in meaningful ways.
Reposted by Xuhui Zhou
I like the BlueSky approach to "verification". If you own a domain, you can make a DNS record to turn it into your BlueSky handle!
bsky.social/about/blog/4...
bsky.social/about/blog/4...
How to verify your Bluesky account - Bluesky
Here's how to verify your Bluesky account by setting your website as your username.
bsky.social
November 27, 2024 at 4:32 AM
I like the BlueSky approach to "verification". If you own a domain, you can make a DNS record to turn it into your BlueSky handle!
bsky.social/about/blog/4...
bsky.social/about/blog/4...
Reposted by Xuhui Zhou
Hello, Bluesky! Happy to be scrolling the friendly skies with you. Follow for news and updates on LTI folks and their trailblazing research. #AI #NLP #ML #computerscience
November 20, 2024 at 4:04 PM
Hello, Bluesky! Happy to be scrolling the friendly skies with you. Follow for news and updates on LTI folks and their trailblazing research. #AI #NLP #ML #computerscience
Reposted by Xuhui Zhou
some little bluesky tips 🦋
your blocks, likes, lists, and just about everything except chats are PUBLIC
you can pin custom feeds; i like quiet posters, best of follows, mutuals, mentions
if your chronological feed is overwhelming, you can make and pin make a personal list of "unmissable" people
your blocks, likes, lists, and just about everything except chats are PUBLIC
you can pin custom feeds; i like quiet posters, best of follows, mutuals, mentions
if your chronological feed is overwhelming, you can make and pin make a personal list of "unmissable" people
November 20, 2024 at 11:56 AM
some little bluesky tips 🦋
your blocks, likes, lists, and just about everything except chats are PUBLIC
you can pin custom feeds; i like quiet posters, best of follows, mutuals, mentions
if your chronological feed is overwhelming, you can make and pin make a personal list of "unmissable" people
your blocks, likes, lists, and just about everything except chats are PUBLIC
you can pin custom feeds; i like quiet posters, best of follows, mutuals, mentions
if your chronological feed is overwhelming, you can make and pin make a personal list of "unmissable" people
Reposted by Xuhui Zhou
Looking for all your LTI friends on Bluesky? The LTI Starter Pack is here to help!
go.bsky.app/NhTwCVb
go.bsky.app/NhTwCVb
November 20, 2024 at 4:15 PM
Looking for all your LTI friends on Bluesky? The LTI Starter Pack is here to help!
go.bsky.app/NhTwCVb
go.bsky.app/NhTwCVb