Beiduo Chen
beiduo.bsky.social
Beiduo Chen
@beiduo.bsky.social
ELLIS PhD student in NLP @MaiNLPlab, @CisLmu, @LMU_Muenchen
https://mckysse.github.io/
arxiv.org
October 24, 2025 at 1:42 PM
Matching exact probabilities for HLV is unstable. So, we propose a more robust rank-based evaluation that checks preference order. Our combined method outperforms baselines on 3 datasets that exhibit human label variation, showing it better aligns with diverse human perspectives.
October 24, 2025 at 1:37 PM
Instead of unnatural post-hoc explanations, we look forward. A model's CoT already contains rationales for all options. We introduce CoT2EL, a pipeline that uses linguistic discourse segmenters to extract these high-quality, faithful units to explore human label variation.
October 24, 2025 at 1:37 PM
🌍 Broader impact:
Our approach makes capturing disagreement scalable, helping build datasets that reflect real-world ambiguity—without requiring tons of human-written explanations.
Open-sourcing:
📂 github.com/mainlp/MJD-E...
GitHub - mainlp/MJD-Estimator: Implementation of the EMNLP 2024 paper - "Seeing the Big through the Small": Can LLMs Approximate Human Judgment Distributions on NLI from a Few Explanations?; and the A...
Implementation of the EMNLP 2024 paper - "Seeing the Big through the Small": Can LLMs Approximate Human Judgment Distributions on NLI from a Few Explanations?; and the ACL 2025 paper - A ...
github.com
July 15, 2025 at 2:51 PM
🧠 What’s this about?
Human annotations often disagree. Instead of collapsing disagreement into a single label, we model Human Judgment Distributions — how likely humans are to choose each label in NLI tasks.
Capturing this is crucial for interpretability and uncertainty in NLP.
July 15, 2025 at 2:50 PM