Athiya Deviyani
banner
athiya.bsky.social
Athiya Deviyani
@athiya.bsky.social
LTI PhD at CMU on evaluation and trustworthy ML/NLP, prev AI&CS Edinburgh University, Google, YouTube, Apple, Netflix. Views are personal 👩🏻‍💻🇮🇩

athiyadeviyani.github.io
Reposted by Athiya Deviyani
🖋️ Curious how writing differs across (research) cultures?
🚩 Tired of “cultural” evals that don't consult people?

We engaged with interdisciplinary researchers to identify & measure ✨cultural norms✨in scientific writing, and show that❗LLMs flatten them❗

📜 arxiv.org/abs/2506.00784

[1/11]
June 9, 2025 at 11:30 PM
Excited to be in Albuquerque for #NAACL2025 🏜️ presenting our poster "Contextual Metric Meta-Evaluation by Measuring Local Metric Accuracy"!

Come find me at
📍 Hall 3, Session B
🗓️ Wednesday, April 30 (tomorrow!)
🕚 11:00–12:30

Let’s talk about all things eval! 📊
Ever trusted a metric that works great on average, only for it to fail in your specific use case?

In our #NAACL2025 paper (w/ @841io.bsky.social), we show why global evaluations are not enough and why context matters more than you think.

📄 aclanthology.org/2025.finding...
#NLP #Evaluation

(🧵1/9)
April 30, 2025 at 2:39 AM
Reposted by Athiya Deviyani
If you're at NAACL this week (or just want to keep track), I have a feed for you: bsky.app/profile/did:...

Currently pulling everyone that mentions NAACL, posts a link from the ACL Anthology, or has NAACL in their username. Happy conferencing!
April 29, 2025 at 6:07 PM
Reposted by Athiya Deviyani
Can self-supervised models 🤖 understand allophony 🗣? Excited to share my new #NAACL2025 paper: Leveraging Allophony in Self-Supervised Speech Models for Atypical Pronunciation Assessment arxiv.org/abs/2502.07029 (1/n)
April 29, 2025 at 5:00 PM
Reposted by Athiya Deviyani
🚀 Excited to share a new interp+agents paper: 🐭🐱 MICE for CATs: Model-Internal Confidence Estimation for Calibrating Agents with Tools appearing at #NAACL2025

This was work done @msftresearch.bsky.social last summer with Jason Eisner, Justin Svegliato, Ben Van Durme, Yu Su, and Sam Thomson

1/🧵
April 29, 2025 at 1:41 PM
Reposted by Athiya Deviyani
When interacting with ChatGPT, have you wondered if they would ever "lie" to you? We found that under pressure, LLMs often choose deception. Our new #NAACL2025 paper, "AI-LIEDAR ," reveals models were truthful less than 50% of the time when faced with utility-truthfulness conflicts! 🤯 1/
April 28, 2025 at 8:36 PM
Ever trusted a metric that works great on average, only for it to fail in your specific use case?

In our #NAACL2025 paper (w/ @841io.bsky.social), we show why global evaluations are not enough and why context matters more than you think.

📄 aclanthology.org/2025.finding...
#NLP #Evaluation

(🧵1/9)
April 29, 2025 at 5:10 PM