Cesare
cesare-spinoso.bsky.social
Cesare
@cesare-spinoso.bsky.social
Hello! I'm Cesare (pronounced Chez-array). I'm a PhD student at McGill/Mila working in NLP/computational pragmatics.

@mcgill-nlp.bsky.social
@mila-quebec.bsky.social
https://cesare-spinoso.github.io/
Reposted by Cesare
A new paper accepted in @colmweb.org COLM 2025! I led a group of 3 brilliant students to dive deep into the problem of discrimination in language models. We discovered that models that take racist decisions don’t always have biased thoughts!
July 25, 2025 at 12:03 AM
Reposted by Cesare
Our new paper in #PNAS (bit.ly/4fcWfma) presents a surprising finding—when words change meaning, older speakers rapidly adopt the new usage; inter-generational differences are often minor.

w/ Michelle Yang, ‪@sivareddyg.bsky.social‬ , @msonderegger.bsky.social‬ and @dallascard.bsky.social‬👇(1/12)
July 29, 2025 at 12:06 PM
Reposted by Cesare
What do systematic hallucinations in LLMs tell us about their generalization abilities?

Come to our poster at #ACL2025 on July 29th at 4 PM in Level 0, Halls X4/X5. Would love to chat about interpretability, hallucinations, and reasoning :)

@mcgill-nlp.bsky.social @mila-quebec.bsky.social
July 28, 2025 at 9:18 AM
How can we use models of cognition to help LLMs interpret figurative language (irony, hyperbole) in a more human-like manner? Come to our #ACL2025NLP poster on Wednesday at 11AM (exhibit hall - exact location TBA) to find out! @mcgill-nlp.bsky.social @mila-quebec.bsky.social @aclmeeting.bsky.social
July 28, 2025 at 9:16 AM
A blizzard is raging through Montreal when your friend says “Looks like Florida out there!” Humans easily interpret irony, while LLMs struggle with it. We propose a 𝘳𝘩𝘦𝘵𝘰𝘳𝘪𝘤𝘢𝘭-𝘴𝘵𝘳𝘢𝘵𝘦𝘨𝘺-𝘢𝘸𝘢𝘳𝘦 probabilistic framework as a solution.
Paper: arxiv.org/abs/2506.09301 to appear @ #ACL2025 (Main)
June 26, 2025 at 3:52 PM
Reposted by Cesare
Started a new podcast with @tomvergara.bsky.social !

Behind the Research of AI:
We look behind the scenes, beyond the polished papers 🧐🧪

If this sounds fun, check out our first "official" episode with the awesome Gauthier Gidel
from @mila-quebec.bsky.social :

open.spotify.com/episode/7oTc...
02 | Gauthier Gidel: Bridging Theory and Deep Learning, Vibes at Mila, and the Effects of AI on Art
Behind the Research of AI · Episode
open.spotify.com
June 25, 2025 at 3:54 PM
Reposted by Cesare
"Build the web for agents, not agents for the web"

This position paper argues that rather than forcing web agents to adapt to UIs designed for humans, we should develop a new interface optimized for web agents, which we call Agentic Web Interface (AWI).

arxiv.org/abs/2506.10953
June 14, 2025 at 4:17 AM
Reposted by Cesare
New paper in Interspeech 2025 🚨
@interspeech.bsky.social

A Robust Model for Arabic Dialect Identification using Voice Conversion

Paper 📝 arxiv.org/pdf/2505.24713
Demo 🎙️https://shorturl.at/rrMm6

#Arabic #SpeechTech #NLProc #AI #Speech #ArabicDialects #Interspeech2025 #ArabicNLP
June 10, 2025 at 10:07 AM
Reposted by Cesare
Do LLMs hallucinate randomly? Not quite.

Our #ACL2025 (Main) paper shows that hallucinations under irrelevant contexts follow a systematic failure mode — revealing how LLMs generalize using abstract classes + context cues, albeit unreliably.

📎 Paper: arxiv.org/abs/2505.22630 1/n
June 6, 2025 at 6:10 PM
Reposted by Cesare
Congratulations to Mila members @adadtur.bsky.social , Gaurav Kamath and @sivareddyg.bsky.social for their SAC award at NAACL! Check out Ada's talk in Session I: Oral/Poster 6. Paper: arxiv.org/abs/2502.05670
May 1, 2025 at 2:30 PM
Reposted by Cesare
Ada is an undergrad and will soon be looking for PhDs. Gaurav is a PhD student looking for intellectually stimulating internships/visiting positions. They did most of the work without much of my help. Highly recommend them. Please reach out to them if you have any positions.
Language Models Largely Exhibit Human-like Constituent Ordering Preferences
Though English sentences are typically inflexible vis-à-vis word order, constituents often show far more variability in ordering. One prominent theory presents the notion that constituent ordering is ...
arxiv.org
May 1, 2025 at 3:14 PM
Reposted by Cesare
Great work from labmates on LLMs vs humans regarding linguistic preferences: You know when a sentence kind of feels off e.g. "I met at the park the man". So in what ways do LLMs follow these human intuitions?
Congratulations to Mila members @adadtur.bsky.social , Gaurav Kamath and @sivareddyg.bsky.social for their SAC award at NAACL! Check out Ada's talk in Session I: Oral/Poster 6. Paper: arxiv.org/abs/2502.05670
May 1, 2025 at 3:04 PM
Reposted by Cesare
Instruction-following retrievers can efficiently and accurately search for harmful and sensitive information on the internet! 🌐💣

Retrievers need to be aligned too! 🚨🚨🚨

Work done with the wonderful Nick and @sivareddyg.bsky.social

🔗 mcgill-nlp.github.io/malicious-ir/
Thread: 🧵👇
Exploiting Instruction-Following Retrievers for Malicious Information Retrieval
Parishad BehnamGhader, Nicholas Meade, Siva Reddy
mcgill-nlp.github.io
March 12, 2025 at 4:15 PM
Reposted by Cesare
How to Get Your LLM to Generate Challenging
Problems for Evaluation? 🤔 Check out our CHASE recipe. A highly relevant problem given that most human-curated datasets are crushed within days.
Presenting ✨ 𝐂𝐇𝐀𝐒𝐄: 𝐆𝐞𝐧𝐞𝐫𝐚𝐭𝐢𝐧𝐠 𝐜𝐡𝐚𝐥𝐥𝐞𝐧𝐠𝐢𝐧𝐠 𝐬𝐲𝐧𝐭𝐡𝐞𝐭𝐢𝐜 𝐝𝐚𝐭𝐚 𝐟𝐨𝐫 𝐞𝐯𝐚𝐥𝐮𝐚𝐭𝐢𝐨𝐧 ✨

Work w/ fantastic advisors Dima Bahdanau and @sivareddyg.bsky.social

Thread 🧵:
February 21, 2025 at 6:53 PM
Reposted by Cesare
Introducing MVL-SIB, a massively multilingual vision-language benchmark for cross-modal topic matching in 205 languages!

🤔Tasks: Given images (sentences), select topically matching sentence (image).

Arxiv: arxiv.org/abs/2502.12852
HF: huggingface.co/datasets/Wue...

Details👇
February 21, 2025 at 7:46 AM
Reposted by Cesare
Y’all we won!!!!!!!!! 🇨🇦
February 21, 2025 at 4:32 AM
Reposted by Cesare
The submission deadline is in less than a month! We welcome encore submissions, so consider submitting your work regardless of whether it's been accepted or not #chi2025 😉
Human-centered Evalulation and Auditing of Language models (HEAL) workshop is back for #CHI2025, with this year's special theme: “Mind the Context”! Come join us on this bridge between #HCI and #NLProc!

Workshop submission deadline: Feb 17 AoE
More info at heal-workshop.github.io.
January 22, 2025 at 3:32 PM
Reposted by Cesare
Human-centered Evalulation and Auditing of Language models (HEAL) workshop is back for #CHI2025, with this year's special theme: “Mind the Context”! Come join us on this bridge between #HCI and #NLProc!

Workshop submission deadline: Feb 17 AoE
More info at heal-workshop.github.io.
December 16, 2024 at 10:07 PM
Reposted by Cesare
It turns out we had even more papers at EMNLP!

Let's complete the list with three more🧵
Our lab members recently presented 3 papers at @emnlpmeeting.bsky.social in Miami ☀️ 📜

From interpretability to bias/fairness and cultural understanding -> 🧵
November 24, 2024 at 2:17 AM
Reposted by Cesare
Our lab members recently presented 3 papers at @emnlpmeeting.bsky.social in Miami ☀️ 📜

From interpretability to bias/fairness and cultural understanding -> 🧵
November 23, 2024 at 8:35 PM
Reposted by Cesare
I’m putting together a starter pack for researchers working on human-centered AI evaluation. Reply or DM me if you’d like to be added, or if you have suggestions! Thank you!

(It looks NLP-centric at the moment, but that’s due to the current limits of my own knowledge 🙈)

go.bsky.app/G3w9LpE
November 21, 2024 at 3:56 PM
Reposted by Cesare
I didn’t expect to wind up in the news over this but in hindsight, I guess it makes sense lol.

This is the first time I’ve been in the Herald since high school 😂.
November 20, 2024 at 3:17 AM