Paper: arxiv.org/pdf/2509.08825
Join the MilaNLP team and contribute to our upcoming research projects.
🔗 More details: milanlproc.github.io/open_positio...
⏰ Deadline: Jan 31, 2026
Follow me down a rabbit hole I'm calling "doing science is tough and I'm so busy, can't we just make up participants?"
Follow me down a rabbit hole I'm calling "doing science is tough and I'm so busy, can't we just make up participants?"
Lots to think about how we evaluate fairness in language models!
#NLProc #fairness #LLMs
Lots to think about how we evaluate fairness in language models!
#NLProc #fairness #LLMs
This is a real problem, but the paper's core insights aren't exactly news!
A thread with the most important summary... 🧵
Image: shows the LLM system prompt used
This is a real problem, but the paper's core insights aren't exactly news!
A thread with the most important summary... 🧵
Image: shows the LLM system prompt used
@seanjwestwood.bsky.social.
@seanjwestwood.bsky.social.
We have a new paper - led by Desheng Hu, now accepted at @icwsm.bsky.social - exploring that and finding many issues
Preprint: arxiv.org/abs/2511.12920
🧵👇
We have a new paper - led by Desheng Hu, now accepted at @icwsm.bsky.social - exploring that and finding many issues
Preprint: arxiv.org/abs/2511.12920
🧵👇
@joachimbaumann.bsky.social, who will present co-authored work on "Large Language Model Hacking: Quantifying the Hidden Risks of Using LLMs for Text Annotation". Paper and information on how to join ⬇️
@joachimbaumann.bsky.social, who will present co-authored work on "Large Language Model Hacking: Quantifying the Hidden Risks of Using LLMs for Text Annotation". Paper and information on how to join ⬇️
The promise is revolutionary for science & policy. But there’s a huge "IF": Do these simulations actually reflect reality?
To find out, we introduce SimBench: The first large-scale benchmark for group-level social simulation. (1/9)
The promise is revolutionary for science & policy. But there’s a huge "IF": Do these simulations actually reflect reality?
To find out, we introduce SimBench: The first large-scale benchmark for group-level social simulation. (1/9)
✓ LLMs are brittle data annotators
✓ Downstream conclusions flip frequently: LLM hacking risk is real!
✓ Bias correction methods can help but have trade-offs
✓ Use human expert whenever possible
✓ LLMs are brittle data annotators
✓ Downstream conclusions flip frequently: LLM hacking risk is real!
✓ Bias correction methods can help but have trade-offs
✓ Use human expert whenever possible
Paper: arxiv.org/pdf/2509.08825
Last week, we -- the (Amazing) Social Computing Group, held an internal hackathon to work on our informally called “Cultural Imperialism” project.
Last week, we -- the (Amazing) Social Computing Group, held an internal hackathon to work on our informally called “Cultural Imperialism” project.
Experiment + *evidence-based* mitigation strategies in this preprint 👇
Paper: arxiv.org/pdf/2509.08825
Experiment + *evidence-based* mitigation strategies in this preprint 👇
Paper: arxiv.org/pdf/2509.08825
Paper: arxiv.org/pdf/2509.08825
Just joined @milanlp.bsky.social as a Postdoc, working with the amazing @dirkhovy.bsky.social on large language models and computational social science!
Just joined @milanlp.bsky.social as a Postdoc, working with the amazing @dirkhovy.bsky.social on large language models and computational social science!