CIS, LMU Munich
cislmu.bsky.social
CIS, LMU Munich
@cislmu.bsky.social
Center for Information and Language Processing (CIS): NLP research group at LMU Munich led by Hinrich Schuetze and @barbaraplank.bsky.social
🗨️ Beyond “noisy” text: How (and why) to process dialect data
🔎 Keynote talk at WNUT @ NAACL
👥 @verenablaschke.bsky.social
📁 Workshop on noisy and user-generated text (May 3)
The full workshop programme is here: noisy-text.github.io/2025/
bsky.app/profile/vere...
April 29, 2025 at 3:03 PM
📝 Privacy-Preserving Federated Learning for Hate Speech Detection
🔎 We present a federated learning system with differential privacy and fine-tuned ALBERT models for low-resource hate speech detection.
👥 Ivo Júnior, @htyeh1, Axel Wisiorek, @HinrichSchuetze
📁 SRW - Long
April 29, 2025 at 3:03 PM
📝 Linguistic Features in German BERT: The Role of Morphology, Syntax, and Semantics in Multi-Class Text Classification
🔎 Analysis of linguistic features used by German BERT in a classification task.
👥 Henrike Beyer (University of Dundee), Diego Frassinelli
📁 SRW - Short
April 29, 2025 at 3:03 PM
📝 XAMPLER: Learning to Retrieve Cross-Lingual In-Context Examples
🔎 a simple yet effective method to retrieve cross-lingual few-shot examples for multilingual in-context learning
👥 @lpq29743, @andre_t_martins, @HinrichSchuetze
🔗 arxiv.org/abs/2405.05116
📁 Finding - Short
XAMPLER: Learning to Retrieve Cross-Lingual In-Context Examples
Recent studies indicate that leveraging off-the-shelf or fine-tuned retrievers, capable of retrieving relevant in-context examples tailored to the input query, enhances few-shot in-context learning of...
arxiv.org
April 29, 2025 at 3:03 PM
📝 Dialetto, ma Quanto Dialetto? Transcribing and Evaluating Dialects on a Continuum
🔎 We predict speech-to-text model performance on dialect continua with geostatistics.
👥 Ryan Soh-Eun Shim, Barbara Plank
🔗 arxiv.org/abs/2410.14589
📁Findings - Long
Dialetto, ma Quanto Dialetto? Transcribing and Evaluating Dialects on a Continuum
There is increasing interest in looking at dialects in NLP. However, most work to date still treats dialects as discrete categories. For instance, evaluative work in variation-oriented NLP for English...
arxiv.org
April 29, 2025 at 3:03 PM
📝 A Recipe of Parallel Corpora Exploitation for Multilingual Large Language Models
🔎An investigation of the impact of parallel corpora, ... on the performance of multilingual LLMs.
👥 @lpq29743, @andre_t_martins, @HinrichSchuetze
🔗 arxiv.org/abs/2407.00436
📁Finding - Long
A Recipe of Parallel Corpora Exploitation for Multilingual Large Language Models
Recent studies have highlighted the potential of exploiting parallel corpora to enhance multilingual large language models, improving performance in both bilingual tasks, e.g., machine translation, an...
arxiv.org
April 29, 2025 at 3:03 PM