MaiNLP lab, LMU Munich
mainlp.bsky.social
MaiNLP lab, LMU Munich
@mainlp.bsky.social
MaiNLP research lab at CIS, LMU Munich directed by Barbara Plank @barbaraplank.bsky.social


Natural Language Processing | Artificial Intelligence | Computational Linguistics | Human-centric NLP
Awesome! We're also creating one currently and have included yours as a starter :)
August 11, 2025 at 12:19 PM
👥‪ @boleima.bsky.social Yuting Li, Wei Zhou, Ziwei Gong, @janetlauyeung.bsky.social Katja Jasinskaja @annefriedrich.bsky.social Julia Hirschberg, Frauke Kreuter @barbaraplank.bsky.social
July 23, 2025 at 12:32 PM
July 23, 2025 at 12:31 PM
📝 Analyzing the Effect of Linguistic Similarity on Cross-Lingual Transfer: Tasks and Experimental Setups Matter
🔎 263 languages, 10 similarity measures, 3 NLP tasks
👥 @verenablaschke.bsky.social Masha Fedzechkina @maartjeterhoeve.bsky.social
🔗 arxiv.org/abs/2501.14491
📁 Findings – long
July 23, 2025 at 12:30 PM
📝Do LLMs Give Psychometrically Plausible Responses in Educational Assessments?
🔎Analyzing how human-like LLMs are when taking reading, history, and economics tests
👥 @saeub.bsky.social , Diego Frassinelli, @barbaraplank.bsky.social
🔗 arxiv.org/abs/2506.09796
📁BEA workshop - Long
July 23, 2025 at 12:30 PM
📝 GerMedIQ: A Resource for Simulated and Synthesized Anamnesis Interview Responses in German
🔎 We release a novel German anamnesis question-response dataset with human-simulated and LLM-augmented responses.
👥 @JHofenbitzer et al.
🔗 github.com/Jhofenbitzer...
📁SRW - Long
July 23, 2025 at 12:30 PM
📝Probing LLMs for Multilingual Discourse Generalization Through a Unified Label Set
🔎Do LLMs encode and generalize discourse knowledge across languages?
👥 @florian-eichin.com @janetlauyeung.bsky.social @mhedderich.bsky.social @barbaraplank.bsky.social
🔗 arxiv.org/abs/2503.10515
📁Main - Long
July 23, 2025 at 12:30 PM
📝LLMs instead of Human Judges? A Large Scale Empirical Study across 20 NLP Evaluation Tasks
🔎We present a large-scale study of whether LLM judgments can be reliably used as proxies for human judgments
👥Anna Bavaresco et al.
🔗 arxiv.org/abs/2406.18403
📁Main - Short
July 23, 2025 at 12:30 PM
📝 What's the Difference? Supporting Users in Identifying the Effects of Prompt and Model Changes Through Token Patterns
👥 @mhedderich.bsky.social Anyi Wang @raoyuan.bsky.social @florian-eichin.com Jonas Fischer @barbaraplank.bsky.social
🔗 arxiv.org/abs/2504.158...
📁Main - Long
July 23, 2025 at 12:30 PM
📝A Rose by Any Other Name: LLM-Generated Explanations Are Good Proxies for Human Explanations to Collect Label Distributions on NLI
👥 @beiduo.bsky.social Siyao Peng @annakorhonen.bsky.social @barbaraplank.bsky.social
🔗 arxiv.org/abs/2412.13942
📁ACL25 Findings-Long
July 23, 2025 at 12:30 PM
📝Circuit Compositions: Exploring Modular Structures in Transformer-Based Language Models
🔎We study the relationship between circuits for highly compositional and functionally related tasks
👥@pmondorf.bsky.social Sondre Wold @barbaraplank.bsky.social
🔗 arxiv.org/abs/2410.01434
📁Main-Long
July 23, 2025 at 12:30 PM
📝Pragmatics in the Era of Large Language Models: A Survey on Datasets, Evaluation, Opportunities and Challenges
🔎We review existing datasets for evaluating LLMs’ pragmatic capabilities, outlining key challenges and promising future directions
🔗 arxiv.org/abs/2502.12378
📁Main - Long
July 23, 2025 at 12:30 PM
📝Algorithmic Fidelity of Large Language Models in Generating Synthetic German Public Opinions: A Case Study
🔎This study evaluates LLMs in generating German public opinions using open-ended survey data
🔗 arxiv.org/abs/2412.13169
📁Main - Long
July 23, 2025 at 12:30 PM
Reposted by MaiNLP lab, LMU Munich
📄 [ACL 2025 main] Circuit compositions: Exploring Modular Structures in Transformer-Based Language Models (doi.org/10.48550/arX...)
Circuit Compositions: Exploring Modular Structures in Transformer-Based Language Models
A fundamental question in interpretability research is to what extent neural networks, particularly language models, implement reusable functions through subnetworks that can be composed to perform mo...
doi.org
July 18, 2025 at 10:19 AM
Reposted by MaiNLP lab, LMU Munich
📄 [ACL 2025 main] LLMs instead of Human Judges? A Large Scale Empirical Study across 20 NLP Evaluation Tasks (doi.org/10.48550/arX...)
LLMs instead of Human Judges? A Large Scale Empirical Study across 20 NLP Evaluation Tasks
There is an increasing trend towards evaluating NLP models with LLMs instead of human judgments, raising questions about the validity of these evaluations, as well as their reproducibility in the case...
doi.org
July 18, 2025 at 10:19 AM