Michael A. Hedderich
@mhedderich.bsky.social
Research group leader at LMU Munich and MCML on ML, NLP & HCI. Also experimenting with lemonade that glows in the dark 🥤 (he/him)
Check out our survey at #EMNLP2025 and help build a future where low-resource languages including African languages are represented in NLP!
Paper: arxiv.org/abs/2505.21315
This is work lead (in a great way) by Jesujoba Alabi and together with David Adelani and Dietrich Klakow.
Paper: arxiv.org/abs/2505.21315
This is work lead (in a great way) by Jesujoba Alabi and together with David Adelani and Dietrich Klakow.
Charting the Landscape of African NLP: Mapping Progress and Shaping the Road Ahead
With over 2,000 languages and potentially millions of speakers, Africa represents one of the richest linguistic regions in the world. Yet, this diversity is scarcely reflected in state-of-the-art natu...
arxiv.org
October 2, 2025 at 8:43 PM
Check out our survey at #EMNLP2025 and help build a future where low-resource languages including African languages are represented in NLP!
Paper: arxiv.org/abs/2505.21315
This is work lead (in a great way) by Jesujoba Alabi and together with David Adelani and Dietrich Klakow.
Paper: arxiv.org/abs/2505.21315
This is work lead (in a great way) by Jesujoba Alabi and together with David Adelani and Dietrich Klakow.
Based on the analysis, we suggest future directions including:
1️⃣ Scale beyond the top-10 high-resource languages
2️⃣ Build more multicultural, native-language datasets
3️⃣ Develop African-centric LLMs
4️⃣ Focus on human-centered, application-driven NLP
1️⃣ Scale beyond the top-10 high-resource languages
2️⃣ Build more multicultural, native-language datasets
3️⃣ Develop African-centric LLMs
4️⃣ Focus on human-centered, application-driven NLP
October 2, 2025 at 8:43 PM
Based on the analysis, we suggest future directions including:
1️⃣ Scale beyond the top-10 high-resource languages
2️⃣ Build more multicultural, native-language datasets
3️⃣ Develop African-centric LLMs
4️⃣ Focus on human-centered, application-driven NLP
1️⃣ Scale beyond the top-10 high-resource languages
2️⃣ Build more multicultural, native-language datasets
3️⃣ Develop African-centric LLMs
4️⃣ Focus on human-centered, application-driven NLP
Key findings include:
1️⃣ Papers have increased rapidly in the last 5 years 📈
2️⃣ Research is skewed toward certain tasks like MT and NLU
3️⃣ Language coverage is uneven, with a few languages dominating
1️⃣ Papers have increased rapidly in the last 5 years 📈
2️⃣ Research is skewed toward certain tasks like MT and NLU
3️⃣ Language coverage is uneven, with a few languages dominating
October 2, 2025 at 8:43 PM
Key findings include:
1️⃣ Papers have increased rapidly in the last 5 years 📈
2️⃣ Research is skewed toward certain tasks like MT and NLU
3️⃣ Language coverage is uneven, with a few languages dominating
1️⃣ Papers have increased rapidly in the last 5 years 📈
2️⃣ Research is skewed toward certain tasks like MT and NLU
3️⃣ Language coverage is uneven, with a few languages dominating
We cover datasets, tasks, methods, and themes across 25+ venues (NLP, speech, HCI, ML), and manually analyzed 884 papers for this survey.
October 2, 2025 at 8:43 PM
We cover datasets, tasks, methods, and themes across 25+ venues (NLP, speech, HCI, ML), and manually analyzed 884 papers for this survey.
We have 3 main goals:
1️⃣ Comprehensive Overview – Map the research landscape
2️⃣ Accessible Entry Point – Easy starting point for new researchers
3️⃣ Open Issues – Highlight gaps and challenges
1️⃣ Comprehensive Overview – Map the research landscape
2️⃣ Accessible Entry Point – Easy starting point for new researchers
3️⃣ Open Issues – Highlight gaps and challenges
October 2, 2025 at 8:43 PM
We have 3 main goals:
1️⃣ Comprehensive Overview – Map the research landscape
2️⃣ Accessible Entry Point – Easy starting point for new researchers
3️⃣ Open Issues – Highlight gaps and challenges
1️⃣ Comprehensive Overview – Map the research landscape
2️⃣ Accessible Entry Point – Easy starting point for new researchers
3️⃣ Open Issues – Highlight gaps and challenges
Despite resource gaps, NLP research on African languages is far from dormant. Growth is fueled by community initiatives, multilingual large corpora, shared tasks, and dedicated venues, making this a great time to chart the field.
October 2, 2025 at 8:43 PM
Despite resource gaps, NLP research on African languages is far from dormant. Growth is fueled by community initiatives, multilingual large corpora, shared tasks, and dedicated venues, making this a great time to chart the field.
Joint work with Anyi Wang, @raoyuan.bsky.social , @florian-eichin.com , Jonas Fischer and @barbaraplank.bsky.social
Check out the paper at arxiv.org/abs/2504.158... or discuss the work with us at #ACL2025 in Vienna.
Check out the paper at arxiv.org/abs/2504.158... or discuss the work with us at #ACL2025 in Vienna.
July 11, 2025 at 10:43 AM
Joint work with Anyi Wang, @raoyuan.bsky.social , @florian-eichin.com , Jonas Fischer and @barbaraplank.bsky.social
Check out the paper at arxiv.org/abs/2504.158... or discuss the work with us at #ACL2025 in Vienna.
Check out the paper at arxiv.org/abs/2504.158... or discuss the work with us at #ACL2025 in Vienna.
Through
📊 3 new benchmarks with ground truth
📚 evaluation on existing prompt data
🛠 demonstration studies, and
🙇 a user study
we show how Spotlight can reliably provide new insights and support users uncovering relevant differences on bias, cultural artifacts, language style, model failure,...
📊 3 new benchmarks with ground truth
📚 evaluation on existing prompt data
🛠 demonstration studies, and
🙇 a user study
we show how Spotlight can reliably provide new insights and support users uncovering relevant differences on bias, cultural artifacts, language style, model failure,...
July 11, 2025 at 10:43 AM
Through
📊 3 new benchmarks with ground truth
📚 evaluation on existing prompt data
🛠 demonstration studies, and
🙇 a user study
we show how Spotlight can reliably provide new insights and support users uncovering relevant differences on bias, cultural artifacts, language style, model failure,...
📊 3 new benchmarks with ground truth
📚 evaluation on existing prompt data
🛠 demonstration studies, and
🙇 a user study
we show how Spotlight can reliably provide new insights and support users uncovering relevant differences on bias, cultural artifacts, language style, model failure,...
uses data mining + human analysis to supports users in better understanding the behavior of LLM models 🔎
We leverage token patterns to automatically distinguish between random (decoding) variations and systematic differences in LLM outputs and guide the user in their nuanced analysis.
We leverage token patterns to automatically distinguish between random (decoding) variations and systematic differences in LLM outputs and guide the user in their nuanced analysis.
July 11, 2025 at 10:43 AM
uses data mining + human analysis to supports users in better understanding the behavior of LLM models 🔎
We leverage token patterns to automatically distinguish between random (decoding) variations and systematic differences in LLM outputs and guide the user in their nuanced analysis.
We leverage token patterns to automatically distinguish between random (decoding) variations and systematic differences in LLM outputs and guide the user in their nuanced analysis.
uses data mining + human analysis to supports users in better understanding the behavior of LLM models 🔎
We leverage token patterns to automatically distinguish between random (decoding) variations and systematic differences in LLM outputs and guide the user in their nuanced analysis.
We leverage token patterns to automatically distinguish between random (decoding) variations and systematic differences in LLM outputs and guide the user in their nuanced analysis.
July 11, 2025 at 10:38 AM
uses data mining + human analysis to supports users in better understanding the behavior of LLM models 🔎
We leverage token patterns to automatically distinguish between random (decoding) variations and systematic differences in LLM outputs and guide the user in their nuanced analysis.
We leverage token patterns to automatically distinguish between random (decoding) variations and systematic differences in LLM outputs and guide the user in their nuanced analysis.