Dirk Hovy
dirkhovy.bsky.social
Dirk Hovy
@dirkhovy.bsky.social
Professor @milanlp.bsky.social for #NLProc, compsocsci, #ML
Also at http://dirkhovy.com/
Reposted by Dirk Hovy
You missed one: G. Abercrombie, T. Dinkar, A. Cercas Curry, V. Rieser & @dirkhovy.bsky.social Consistency is Key: Disentangling label variation in NLP with Intra-Annotator Agreement. @nlperspectives.bsky.social
November 3, 2025 at 2:34 AM
Excited to head to Suzhou for the 30th edition of #EMNLP2025! 🎉 Had the great honor to serve as general chair this year. Looking forward to catching up with everyone and seeing some amazing #NLP research! 🤓📚
November 2, 2025 at 5:54 AM
Reposted by Dirk Hovy
🗓️ Nov 5 – Main Conference Posters
Personalization up to a Point
🧠 In the context of content moderation, we show that fully personalized models can perpetuate hate speech, and propose a policy-based method to impose legal boundaries.
📍 Hall C | 11:00–12:30
October 31, 2025 at 2:05 PM
Reposted by Dirk Hovy
🗓️ Nov 5 – Main Conference Posters
📘 Biased Tales
A dataset of 5k short LLM bedtime stories generated across sociocultural axes with an evaluation taxonomy for character-centric attributes and context-centric attributes.
📍 Hall C | 11:00–12:30
October 31, 2025 at 2:05 PM
Reposted by Dirk Hovy
🗓️ Nov 5 - Demo
Co-DETECT: Collaborative Discovery of Edge Cases in Text Classification
🧩 Co-DETECT – an iterative, human-LLM collaboration framework for surfacing edge cases and refining annotation codebooks in text classification.
📍 Demo Session 2 – Hall C3 | 14:30–16:00
October 31, 2025 at 2:06 PM
Reposted by Dirk Hovy
🗓️ Nov 6 – Findings Posters
The “r” in “woman” stands for rights.
💬 We propose a taxonomy of social dynamics in implicit misogyny (EN,IT), auditing 9 LLMs — and they consistently fail. The more social knowledge a message requires, the worse they perform.
📍 Hall C | 12:30–13:30
October 31, 2025 at 2:06 PM
Reposted by Dirk Hovy
🗓️ Nov 7 – Main Conference Posters
Principled Personas: Defining and Measuring the Intended Effects of Persona Prompting on Task Performance
🧍 Discussing different applications for LLM persona prompting, and how to measure their success.
📍 Hall C | 10:30–12:00
October 31, 2025 at 2:06 PM
Reposted by Dirk Hovy
🗓️ Nov 7 – Main Conference Posters
TrojanStego: Your Language Model Can Secretly Be a Steganographic Privacy-Leaking Agent
🔒 LLMs can be fine-tuned to leak secrets via token-based steganography!
📍 Hall C | 10:30–12:00
October 31, 2025 at 2:06 PM
Reposted by Dirk Hovy
🗓️ Nov 8 – WiNLP Workshops
No for Some, Yes for Others
🤖 We investigate how sociodemographic persona prompts affect false refusal behaviors in LLMs. Model and task type are the dominant factors driving these refusals.
October 31, 2025 at 2:06 PM
Reposted by Dirk Hovy
🗓️ Nov 8 – NLPerspectives Workshops
Balancing Quality and Variation
🧮 For datasets to represent diverse opinions, they must preserve variation while filtering out spam. We evaluate annotator filtering heuristics and show how they often remove genuine variation.
October 31, 2025 at 2:07 PM
Reposted by Dirk Hovy
🗓️ Nov 8 – BabyLM Workshop
Teacher Demonstrations in a BabyLM's Zone of Proximal Development for Contingent Multi-Turn Interaction
👶 ContingentChat, a Teacher–Student framework that benchmarks and improves multi-turn contingency in a BabyLM trained on 100M words.
October 31, 2025 at 2:07 PM
Reposted by Dirk Hovy
🗓️ Nov 8 – STARSEM Workshop
Generalizability of Media Frames: Corpus Creation and Analysis Across Countries
📰 We investigate how well media frames generalize across different media landscapes. The 15 MFC frames remain broadly applicable, with minor revisions of the guidelines.
October 31, 2025 at 2:07 PM
Reposted by Dirk Hovy
🗓️ Nov 6 – Oral Presentation (TACL)
IssueBench: Millions of Realistic Prompts for Measuring Issue Bias in LLM Writing Assistance
⚖️ A foundation for measuring LLM political bias in realistic user conversations.
📍 A303 | 10:30–12:00
October 31, 2025 at 2:07 PM
Reposted by Dirk Hovy
Proud to present our #EMNLP2025 papers!
Catch our team across Main, Findings, Workshops & Demos 👇
October 31, 2025 at 2:04 PM
Reposted by Dirk Hovy
There’s plenty of evidence for political bias in LLMs, but very few evals reflect realistic LLM use cases — which is where bias actually matters.

IssueBench, our attempt to fix this, is accepted at TACL, and I will be at #EMNLP2025 next week to talk about it!

New results 🧵
Are LLMs biased when they write about political issues?

We just released IssueBench – the largest, most realistic benchmark of its kind – to answer this question more robustly than ever before.

Long 🧵with spicy results 👇
October 29, 2025 at 4:12 PM
Reposted by Dirk Hovy
Can LLMs learn to simulate individuals' judgments based on their demographics?

Not quite! In our new paper, we found that LLMs do not learn information about demographics, but instead learn individual annotators' patterns based on unique combinations of attributes!

🧵
April 14, 2025 at 1:18 PM
Reposted by Dirk Hovy
LLMs are good at simulating human behaviours, but they are not going to be great unless we train them to.

We hope SimBench can be the foundation for more specialised development of LLM simulators.

I really enjoyed working on this with @tiancheng.bsky.social et al. Many fun results 👇
Can AI simulate human behavior? 🧠
The promise is revolutionary for science & policy. But there’s a huge "IF": Do these simulations actually reflect reality?
To find out, we introduce SimBench: The first large-scale benchmark for group-level social simulation. (1/9)
October 28, 2025 at 5:58 PM
Reposted by Dirk Hovy
Check out the paper and data for details!
Paper: arxiv.org/abs/2510.17516
Data: huggingface.co/datasets/pit...
Website: simbench.tiancheng.hu (9/9)
October 28, 2025 at 4:54 PM
Reposted by Dirk Hovy
October 28, 2025 at 4:54 PM
Reposted by Dirk Hovy
SimBench is a big, unified benchmark built from 20 diverse datasets with a global participant pool.
It spans moral dilemmas, economic games, psych assessments & more to rigorously test how well LLMs can predict group-level human responses across a wide range of tasks. (2/9)
October 28, 2025 at 4:54 PM
Reposted by Dirk Hovy
Can AI simulate human behavior? 🧠
The promise is revolutionary for science & policy. But there’s a huge "IF": Do these simulations actually reflect reality?
To find out, we introduce SimBench: The first large-scale benchmark for group-level social simulation. (1/9)
October 28, 2025 at 4:54 PM
Reposted by Dirk Hovy
#MemoryModay #NLProc 'Benchmarking Post-Hoc Interpretability Approaches for Transformer-based Misogyny Detection' - Attanasio et al. Explores reliability of interpretability in hate speech detection.
Benchmarking Post-Hoc Interpretability Approaches for Transformer-based Misogyny Detection
Giuseppe Attanasio, Debora Nozza, Eliana Pastor, Dirk Hovy. Proceedings of NLP Power! The First Workshop on Efficient Benchmarking in NLP. 2022.
aclanthology.org
October 20, 2025 at 3:23 PM
Reposted by Dirk Hovy
Over the past two days, I participated in the @erc.europa.eu Workshop on Data Access under DSA Article 40.

An enriching experience that deepened my understanding of the DSA's implications for research and enabled me to connect with exceptional media researchers.

erc.europa.eu/news-events/...
ERC Workshop on data access under the Digital Services Act (DSA) Article 40 (opening session)
The Digital Services Act (DSA) is an European legislation that specifies a set of rules to make the digital space safer and more trustworthy for users.
erc.europa.eu
October 23, 2025 at 5:02 PM
Reposted by Dirk Hovy
#MemoryModay #NLProc 'Dense Node Representation for Geolocation' by Fornaciari & @dirkhovy.bsky.social reveals efficient geolocation methods using node2vec & doc2vec models. Greater network size, less parameters.
Dense Node Representation for Geolocation
Tommaso Fornaciari, Dirk Hovy. Proceedings of the 5th Workshop on Noisy User-generated Text (W-NUT 2019). 2019.
aclanthology.org
October 27, 2025 at 4:06 PM
Reposted by Dirk Hovy
With the conference fast approaching, we want to offer a huge thank you to our incredible #EMNLP2025 sponsors! 💙 Your support helped keep student registration fees low and made this a record year — both in number of sponsors and total sponsorship funds! 🙏
2025.emnlp.org/sponsors/
Sponsors
Official website for the 2025 Conference on Empirical Methods in Natural Language Processing
2025.emnlp.org
October 24, 2025 at 8:26 PM