Maarten Sap
@maartensap.bsky.social
Working on #NLProc for social good.
Currently at LTI at CMU. 🏳🌈
Currently at LTI at CMU. 🏳🌈
Reposted by Maarten Sap
How and when should LLM guardrails be deployed to balance safety and user experience?
Our #EMNLP2025 paper reveals that crafting thoughtful refusals rather than detecting intent is the key to human-centered AI safety.
📄 arxiv.org/abs/2506.00195
🧵[1/9]
Our #EMNLP2025 paper reveals that crafting thoughtful refusals rather than detecting intent is the key to human-centered AI safety.
📄 arxiv.org/abs/2506.00195
🧵[1/9]
October 20, 2025 at 8:04 PM
How and when should LLM guardrails be deployed to balance safety and user experience?
Our #EMNLP2025 paper reveals that crafting thoughtful refusals rather than detecting intent is the key to human-centered AI safety.
📄 arxiv.org/abs/2506.00195
🧵[1/9]
Our #EMNLP2025 paper reveals that crafting thoughtful refusals rather than detecting intent is the key to human-centered AI safety.
📄 arxiv.org/abs/2506.00195
🧵[1/9]
Reposted by Maarten Sap
📣📣 Announcing the first PersonaLLM Workshop on LLM Persona Modeling.
If you work on persona driven LLMs, social cognition, HCI, psychology, cognitive science, cultural modeling, or evaluation, do not miss the chance to submit.
Submit here: openreview.net/group?id=Neu...
If you work on persona driven LLMs, social cognition, HCI, psychology, cognitive science, cultural modeling, or evaluation, do not miss the chance to submit.
Submit here: openreview.net/group?id=Neu...
NeurIPS 2025 Workshop Mexico City PersonaNLP
Welcome to the OpenReview homepage for NeurIPS 2025 Workshop Mexico City PersonaNLP
openreview.net
October 17, 2025 at 12:57 AM
📣📣 Announcing the first PersonaLLM Workshop on LLM Persona Modeling.
If you work on persona driven LLMs, social cognition, HCI, psychology, cognitive science, cultural modeling, or evaluation, do not miss the chance to submit.
Submit here: openreview.net/group?id=Neu...
If you work on persona driven LLMs, social cognition, HCI, psychology, cognitive science, cultural modeling, or evaluation, do not miss the chance to submit.
Submit here: openreview.net/group?id=Neu...
I’m ✨ super excited and grateful ✨to announce that I'm part of the 2025 class of #PackardFellows (www.packard.org/2025fellows). The @packardfdn.bsky.social and this fellowship will allow me to explore exciting research directions towards culturally responsible and safe AI 🌍🌈
October 15, 2025 at 1:06 PM
I’m ✨ super excited and grateful ✨to announce that I'm part of the 2025 class of #PackardFellows (www.packard.org/2025fellows). The @packardfdn.bsky.social and this fellowship will allow me to explore exciting research directions towards culturally responsible and safe AI 🌍🌈
Reposted by Maarten Sap
🚨New paper: Reward Models (RMs) are used to align LLMs, but can they be steered toward user-specific value/style preferences?
With EVALUESTEER, we find even the best RMs we tested exhibit their own value/style biases, and are unable to align with a user >25% of the time. 🧵
With EVALUESTEER, we find even the best RMs we tested exhibit their own value/style biases, and are unable to align with a user >25% of the time. 🧵
October 14, 2025 at 3:59 PM
🚨New paper: Reward Models (RMs) are used to align LLMs, but can they be steered toward user-specific value/style preferences?
With EVALUESTEER, we find even the best RMs we tested exhibit their own value/style biases, and are unable to align with a user >25% of the time. 🧵
With EVALUESTEER, we find even the best RMs we tested exhibit their own value/style biases, and are unable to align with a user >25% of the time. 🧵
Reposted by Maarten Sap
We are launching our Graduate School Application Financial Aid Program (www.queerinai.com/grad-app-aid) for 2025-2026. We’ll give up to $750 per person to LGBTQIA+ STEM scholars applying to graduate programs. Apply at openreview.net/group?id=Que.... 1/5
Grad App Aid — Queer in AI
www.queerinai.com
October 9, 2025 at 12:37 AM
We are launching our Graduate School Application Financial Aid Program (www.queerinai.com/grad-app-aid) for 2025-2026. We’ll give up to $750 per person to LGBTQIA+ STEM scholars applying to graduate programs. Apply at openreview.net/group?id=Que.... 1/5
I'm also giving a talk at #COLM2025 Social Simulation workshop (sites.google.com/view/social-...) on Unlocking Social Intelligence in AI, at 2:30pm Oct 10th!
October 6, 2025 at 2:53 PM
I'm also giving a talk at #COLM2025 Social Simulation workshop (sites.google.com/view/social-...) on Unlocking Social Intelligence in AI, at 2:30pm Oct 10th!
Reposted by Maarten Sap
📢 New #COLM2025 paper 📢
Standard benchmarks give every LLM the same questions. This is like testing 5th graders and college seniors with *one* exam! 🥴
Meet Fluid Benchmarking, a capability-adaptive eval method delivering lower variance, higher validity, and reduced cost.
🧵
Standard benchmarks give every LLM the same questions. This is like testing 5th graders and college seniors with *one* exam! 🥴
Meet Fluid Benchmarking, a capability-adaptive eval method delivering lower variance, higher validity, and reduced cost.
🧵
September 16, 2025 at 5:16 PM
📢 New #COLM2025 paper 📢
Standard benchmarks give every LLM the same questions. This is like testing 5th graders and college seniors with *one* exam! 🥴
Meet Fluid Benchmarking, a capability-adaptive eval method delivering lower variance, higher validity, and reduced cost.
🧵
Standard benchmarks give every LLM the same questions. This is like testing 5th graders and college seniors with *one* exam! 🥴
Meet Fluid Benchmarking, a capability-adaptive eval method delivering lower variance, higher validity, and reduced cost.
🧵
That's a lot of people! Fall Sapling lab outing, welcoming our new postdoc Vasudha, and visitors Tze Hong and Chani! (just missing Jocelyn)
August 26, 2025 at 5:53 PM
That's a lot of people! Fall Sapling lab outing, welcoming our new postdoc Vasudha, and visitors Tze Hong and Chani! (just missing Jocelyn)
I'm excited cause I'm teaching/coordinating a new unique class, where we teach new PhD students all the "soft" skills of research, incl. ideation, reviewing, presenting, interviewing, advising, etc.
Each lecture is taught by a different LTI prof! It takes a village! maartensap.com/11705/Fall20...
Each lecture is taught by a different LTI prof! It takes a village! maartensap.com/11705/Fall20...
August 25, 2025 at 6:01 PM
I'm excited cause I'm teaching/coordinating a new unique class, where we teach new PhD students all the "soft" skills of research, incl. ideation, reviewing, presenting, interviewing, advising, etc.
Each lecture is taught by a different LTI prof! It takes a village! maartensap.com/11705/Fall20...
Each lecture is taught by a different LTI prof! It takes a village! maartensap.com/11705/Fall20...
I spoke to Forbes about why model "welfare" is a silly framing to an important issue; models don't have feelings, and it's a big distraction from real questions like tensions between safety vs. user utility, which are NLP/HCI/policy questions www.forbes.com/sites/victor...
August 22, 2025 at 1:00 PM
I spoke to Forbes about why model "welfare" is a silly framing to an important issue; models don't have feelings, and it's a big distraction from real questions like tensions between safety vs. user utility, which are NLP/HCI/policy questions www.forbes.com/sites/victor...
Super super excited about this :D :D
A hearty congratulations to the LTI's
@maartensap.bsky.social, who's been awarded an
Okawa Research Grant for his work in his work in socially-aware artificial intelligence. lti.cmu.edu/news-and-eve...
@maartensap.bsky.social, who's been awarded an
Okawa Research Grant for his work in his work in socially-aware artificial intelligence. lti.cmu.edu/news-and-eve...
Sap Awarded 2025 Okawa Research Grant - Language Technologies Institute - School of Computer Science - Carnegie Mellon University
LTI Assistant Professor Maarten Sap received the prestigious award for his work in socially-aware artificial intelligence
lti.cmu.edu
August 20, 2025 at 6:14 PM
Super super excited about this :D :D
Reposted by Maarten Sap
Hand gestures are a major mode of human communication, but they don't always translate well across cultures. New research from @akhilayerukola.bsky.social, @maartensap.bsky.social and others is aimed at giving AI systems a hand with overcoming cultural biases:
lti.cmu.edu/news-and-eve...
lti.cmu.edu/news-and-eve...
Using Hand Gestures To Evaluate AI Biases - Language Technologies Institute - School of Computer Science - Carnegie Mellon University
LTI researchers have created a model to help generative AI systems understand the cultural nuance of gestures.
lti.cmu.edu
June 27, 2025 at 6:04 PM
Hand gestures are a major mode of human communication, but they don't always translate well across cultures. New research from @akhilayerukola.bsky.social, @maartensap.bsky.social and others is aimed at giving AI systems a hand with overcoming cultural biases:
lti.cmu.edu/news-and-eve...
lti.cmu.edu/news-and-eve...
Reposted by Maarten Sap
New research from LTI, UMich, & Allen Institute for AI: LLMs don’t just hallucinate – sometimes, they lie. When truthfulness clashes with utility (pleasing users, boosting brands), models often mislead. @nlpxuhui.bsky.social and @maartensap.bsky.social discuss the paper:
lti.cmu.edu/news-and-eve...
lti.cmu.edu/news-and-eve...
Does Your Chatbot Swear to Tell the Truth? - Language Technologies Institute - School of Computer Science - Carnegie Mellon University
New research finds that LLM-based agents can't always be trusted to be truthful
lti.cmu.edu
June 26, 2025 at 7:21 PM
New research from LTI, UMich, & Allen Institute for AI: LLMs don’t just hallucinate – sometimes, they lie. When truthfulness clashes with utility (pleasing users, boosting brands), models often mislead. @nlpxuhui.bsky.social and @maartensap.bsky.social discuss the paper:
lti.cmu.edu/news-and-eve...
lti.cmu.edu/news-and-eve...
Reposted by Maarten Sap
What if AI played the role of your sassy gay bestie 🏳️🌈 or AAVE-speaking friend 👋🏾?
You: “Can you plan a trip?”
🤖 AI: “Yasss queen! let’s werk this babe✨💅”
LLMs can talk like us, but it shapes how we trust, rely on & relate to them 🧵
📣 our #FAccT2025 paper: bit.ly/3HJ6rWI
[1/9]
You: “Can you plan a trip?”
🤖 AI: “Yasss queen! let’s werk this babe✨💅”
LLMs can talk like us, but it shapes how we trust, rely on & relate to them 🧵
📣 our #FAccT2025 paper: bit.ly/3HJ6rWI
[1/9]
June 17, 2025 at 7:39 PM
What if AI played the role of your sassy gay bestie 🏳️🌈 or AAVE-speaking friend 👋🏾?
You: “Can you plan a trip?”
🤖 AI: “Yasss queen! let’s werk this babe✨💅”
LLMs can talk like us, but it shapes how we trust, rely on & relate to them 🧵
📣 our #FAccT2025 paper: bit.ly/3HJ6rWI
[1/9]
You: “Can you plan a trip?”
🤖 AI: “Yasss queen! let’s werk this babe✨💅”
LLMs can talk like us, but it shapes how we trust, rely on & relate to them 🧵
📣 our #FAccT2025 paper: bit.ly/3HJ6rWI
[1/9]
Reposted by Maarten Sap
📣 Super excited to organize the first workshop on ✨NLP for Democracy✨ at COLM @colmweb.org!!
Check out our website: sites.google.com/andrew.cmu.e...
Call for submissions (extended abstracts) due June 19, 11:59pm AoE
#COLM2025 #LLMs #NLP #NLProc #ComputationalSocialScience
Check out our website: sites.google.com/andrew.cmu.e...
Call for submissions (extended abstracts) due June 19, 11:59pm AoE
#COLM2025 #LLMs #NLP #NLProc #ComputationalSocialScience
NLP 4 Democracy - COLM 2025
sites.google.com
May 21, 2025 at 4:39 PM
📣 Super excited to organize the first workshop on ✨NLP for Democracy✨ at COLM @colmweb.org!!
Check out our website: sites.google.com/andrew.cmu.e...
Call for submissions (extended abstracts) due June 19, 11:59pm AoE
#COLM2025 #LLMs #NLP #NLProc #ComputationalSocialScience
Check out our website: sites.google.com/andrew.cmu.e...
Call for submissions (extended abstracts) due June 19, 11:59pm AoE
#COLM2025 #LLMs #NLP #NLProc #ComputationalSocialScience
Reposted by Maarten Sap
Notice our new look? We're thrilled to unveil our new logo – representing our vision, values, and the future ahead. Stay tuned for more!
May 12, 2025 at 5:09 PM
Notice our new look? We're thrilled to unveil our new logo – representing our vision, values, and the future ahead. Stay tuned for more!
super excited about this 🥰🥰
Thrilled that our paper won 🏆 Best Paper Runner-Up 🏆 at #NAACL25!!
Our work (REL-A.I.) introduces an evaluation framework that measures human reliance on LLMs and reveals how contextual features like anthropomorphism, subject, and user history can significantly influence user reliance behaviors.
Our work (REL-A.I.) introduces an evaluation framework that measures human reliance on LLMs and reveals how contextual features like anthropomorphism, subject, and user history can significantly influence user reliance behaviors.
April 29, 2025 at 10:57 PM
super excited about this 🥰🥰
Reposted by Maarten Sap
When interacting with ChatGPT, have you wondered if they would ever "lie" to you? We found that under pressure, LLMs often choose deception. Our new #NAACL2025 paper, "AI-LIEDAR ," reveals models were truthful less than 50% of the time when faced with utility-truthfulness conflicts! 🤯 1/
April 28, 2025 at 8:36 PM
When interacting with ChatGPT, have you wondered if they would ever "lie" to you? We found that under pressure, LLMs often choose deception. Our new #NAACL2025 paper, "AI-LIEDAR ," reveals models were truthful less than 50% of the time when faced with utility-truthfulness conflicts! 🤯 1/
Reposted by Maarten Sap
1/🚨 𝗡𝗲𝘄 𝗽𝗮𝗽𝗲𝗿 𝗮𝗹𝗲𝗿𝘁 🚨
RAG systems excel on academic benchmarks - but are they robust to variations in linguistic style?
We find RAG systems are brittle. Small shifts in phrasing trigger cascading errors, driven by the complexity of the RAG pipeline 🧵
RAG systems excel on academic benchmarks - but are they robust to variations in linguistic style?
We find RAG systems are brittle. Small shifts in phrasing trigger cascading errors, driven by the complexity of the RAG pipeline 🧵
April 17, 2025 at 7:55 PM
1/🚨 𝗡𝗲𝘄 𝗽𝗮𝗽𝗲𝗿 𝗮𝗹𝗲𝗿𝘁 🚨
RAG systems excel on academic benchmarks - but are they robust to variations in linguistic style?
We find RAG systems are brittle. Small shifts in phrasing trigger cascading errors, driven by the complexity of the RAG pipeline 🧵
RAG systems excel on academic benchmarks - but are they robust to variations in linguistic style?
We find RAG systems are brittle. Small shifts in phrasing trigger cascading errors, driven by the complexity of the RAG pipeline 🧵
RLHF is built upon some quite oversimplistic assumptions, i.e., that preferences between pairs of text are purely about quality. But this is an inherently subjective task (not unlike toxicity annotation) -- so we wanted to know, do biases similar to toxicity annotation emerge in reward models?
Reward models for LMs are meant to align outputs with human preferences—but do they accidentally encode dialect biases? 🤔
Excited to share our paper on biases against African American Language in reward models, accepted to #NAACL2025 Findings! 🎉
Paper: arxiv.org/abs/2502.12858 (1/10)
Excited to share our paper on biases against African American Language in reward models, accepted to #NAACL2025 Findings! 🎉
Paper: arxiv.org/abs/2502.12858 (1/10)
March 6, 2025 at 8:54 PM
RLHF is built upon some quite oversimplistic assumptions, i.e., that preferences between pairs of text are purely about quality. But this is an inherently subjective task (not unlike toxicity annotation) -- so we wanted to know, do biases similar to toxicity annotation emerge in reward models?
My PhD student Akhila's been doing some incredible cultural work in the last few years! Check out out latest work on cultural safety and hand gestures, showing most vision and/or language AI systems are very cross-culturally unsafe!
Did you know? Gestures used to express universal concepts—like wishing for luck—vary DRAMATICALLY across cultures?
🤞means luck in US but deeply offensive in Vietnam 🚨
📣 We introduce MC-SIGNS, a test bed to evaluate how LLMs/VLMs/T2I handle such nonverbal behavior!
📜: arxiv.org/abs/2502.17710
🤞means luck in US but deeply offensive in Vietnam 🚨
📣 We introduce MC-SIGNS, a test bed to evaluate how LLMs/VLMs/T2I handle such nonverbal behavior!
📜: arxiv.org/abs/2502.17710
February 26, 2025 at 5:20 PM
My PhD student Akhila's been doing some incredible cultural work in the last few years! Check out out latest work on cultural safety and hand gestures, showing most vision and/or language AI systems are very cross-culturally unsafe!
Super excited to unveil this work! LLMs need to ask better questions, and our method with synthetic data corruption can help generalize to other interesting LLM improvements (more to come on that ;) )
Asking the right questions can make or break decisions in fields like medicine, law, and beyond✴️
Our new framework ALFA—ALignment with Fine-grained Attributes—teaches LLMs to PROACTIVE seek information through better questions through **structured rewards**🏥❓
(co-led with @jiminmun.bsky.social)
👉🏻🧵
Our new framework ALFA—ALignment with Fine-grained Attributes—teaches LLMs to PROACTIVE seek information through better questions through **structured rewards**🏥❓
(co-led with @jiminmun.bsky.social)
👉🏻🧵
February 21, 2025 at 6:10 PM
Super excited to unveil this work! LLMs need to ask better questions, and our method with synthetic data corruption can help generalize to other interesting LLM improvements (more to come on that ;) )
Reposted by Maarten Sap
LLM agents can code—but can they ask clarifying questions? 🤖💬
Tired of coding agents wasting time and API credits, only to output broken code? What if they asked first instead of guessing? 🚀
(New work led by Sanidhya Vijay: www.linkedin.com/in/sanidhya-...)
Tired of coding agents wasting time and API credits, only to output broken code? What if they asked first instead of guessing? 🚀
(New work led by Sanidhya Vijay: www.linkedin.com/in/sanidhya-...)
February 19, 2025 at 7:46 PM
LLM agents can code—but can they ask clarifying questions? 🤖💬
Tired of coding agents wasting time and API credits, only to output broken code? What if they asked first instead of guessing? 🚀
(New work led by Sanidhya Vijay: www.linkedin.com/in/sanidhya-...)
Tired of coding agents wasting time and API credits, only to output broken code? What if they asked first instead of guessing? 🚀
(New work led by Sanidhya Vijay: www.linkedin.com/in/sanidhya-...)