#Multilinguality
@kellymarchisio.bsky.social from Cohere presents “Building Multilingual LLMs in Industry”, sharing insights on training multilinguality at scale!
November 9, 2025 at 1:26 AM
Presenting at #EMNLP2025 in a moment, session on "Multilinguality and Language Diversity 2" (A301). Our paper on Tokenization Fairness: arxiv.org/abs/2509.20045
Tokenization and Representation Biases in Multilingual Models on Dialectal NLP Tasks
Dialectal data are characterized by linguistic variation that appears small to humans but has a significant impact on the performance of models. This dialect gap has been related to various factors (e...
arxiv.org
November 6, 2025 at 9:32 AM
I'm in Suzhou to present our work on MultiBLiMP, Friday @ 11:45 in the Multilinguality session (A301)!

Come check it out if your interested in multilingual linguistic evaluation of LLMs (there will be parse trees on the slides! There's still use for syntactic structure!)

arxiv.org/abs/2504.02768
November 6, 2025 at 7:08 AM
November 4, 2025 at 10:52 AM
📝 𝗪𝗵𝗮𝘁 𝗮𝘀𝗽𝗲𝗰𝘁𝘀 𝗱𝗼 𝗿𝗲𝘃𝗶𝗲𝘄𝗲𝗿𝘀 𝗳𝗼𝗰𝘂𝘀 𝗼𝗻 𝗶𝗻 𝗽𝗲𝗲𝗿 𝗿𝗲𝘃𝗶𝗲𝘄𝘀?
UKP Lab researchers present a framework for automated aspect analysis that helps to understand how reviewers evaluate papers by identifying criteria such as 𝗡𝗼𝘃𝗲𝗹𝘁𝘆, 𝗦𝗼𝘂𝗻𝗱𝗻𝗲𝘀𝘀, or 𝗗𝗮𝘁𝗮𝘀𝗲𝘁 𝘃𝗮𝗹𝗶𝗱𝗶𝘁𝘆.

(1/🧵)
November 4, 2025 at 10:49 AM
This week at #EMNLP2025, I'll present our research on pretraining a multilingual pixel language model. Join the multilinguality session on Friday at 10:30 in Room A301 to learn more about pixel models and their benefits in multilingual settings. (Unfortunately I’ll be on Zoom)
November 3, 2025 at 5:39 PM
We are presenting this paper at #EMNLP2025 in the “Multilinguality and Language Diversity” oral session this Wednesday (November 5th) from 11:00-12:30 (UTC+8). Paper: aclanthology.org/2025.emnlp-m... Code: github.com/LAGoM-NLP/Co...
Confounding Factors in Relating Model Performance to Morphology
Wessel Poelman, Thomas Bauwens, Miryam de Lhoneux. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. 2025.
aclanthology.org
November 3, 2025 at 11:53 AM
Excited to head over to #Suzhou to present 5 papers at #EMNLP2025 and affiliated venues!
Topics include quality estimation and evaluation 👩‍🔬, speech🗣️, and multilinguality🌐- see you soon! 🤩
@maikezufle.bsky.social @jan-niehues.bsky.social
October 31, 2025 at 1:34 PM
How well do LLMs handle multilinguality? 🌍🤖

🔬We brought the rigor from Machine Translation evaluation to multilingual LLM benchmarking and organized the WMT25 Multilingual Instruction Shared Task spanning 30 languages and 5 subtasks.
October 30, 2025 at 5:51 PM
Q3: How much do you need to scale when adding languages? (The "curse of multilinguality")

🌟Answer: We derived closed-form equations! To go from K to 4K languages while maintaining performance: scale data by 2.74×, model size by 1.4×.

6/
October 28, 2025 at 2:03 PM
📢Thrilled to introduce ATLAS 🗺️: the largest multilingual scaling study to-date—we ran 774 exps (10M-8B params, 400+ languages) to answer:

🌍 Is scaling diff by lang?

🧙‍♂️ Can we model the curse of multilinguality?

⚖️ Pretrain vs finetune from checkpoint?

🔀 X-lingual transfer scores across langs?

1/🧵
October 28, 2025 at 2:03 PM
Shayne Longpre, Sneha Kudugunta, Niklas Muennighoff, I-Hung Hsu, Isaac Caswell, Alex Pentland, Sercan Arik, Chen-Yu Lee, Sayna Ebrahimi
ATLAS: Adaptive Transfer Scaling Laws for Multilingual Pretraining, Finetuning, and Decoding the Curse of Multilinguality
https://arxiv.org/abs/2510.22037
October 28, 2025 at 11:37 AM
Let's talk about eval (automatic or human) and multilinguality at #EMNLP in Suzhou! 🇨🇳

- Efficient evaluation (Nov 5, 16:30, poster session 3)
- MT difficulty (Nov 7, 12:30, findings 3)
- COMET-poly (Nov 8, 11:00, WMT)

(DM to meet 🌿 )
October 28, 2025 at 9:45 AM
Viveiros, Fernandes, Santos, Sannigrahi, Zaranis, Guerreiro, Farajian, Colombo, Neubig, Martins: TowerVision: Understanding and Improving Multilinguality in Vision-Language Models https://arxiv.org/abs/2510.21849 https://arxiv.org/pdf/2510.21849 https://arxiv.org/html/2510.21849
October 28, 2025 at 6:32 AM
Shayne Longpre, et al.: ATLAS: Adaptive Transfer Scaling Laws for Multilingual Pretraining, Finetuning, and Decoding the Curse of Multilinguality https://arxiv.org/abs/2510.22037 https://arxiv.org/pdf/2510.22037 https://arxiv.org/html/2510.22037
October 28, 2025 at 6:30 AM
🚀 We are pleased to announce the First Call for Papers for #WASSA2026

This year, we introduce a Special Track on Multilinguality and Social Bridges between High- & Lesser-Resourced Languages/Communities. 🌍

🗓️ Deadlines: Dec 17 (direct) and Jan 2 (ARR).
🔗 workshop-wassa.github.io/2026/call-fo...
Call for Papers
Workshop on Computational Approaches to Subjectivity, Sentiment & Social Media Analysis
workshop-wassa.github.io
October 21, 2025 at 2:12 PM
Keynote talk: Optimizing Multilinguality Post Training.

Can multilingual ability be boosted at post training?
Julia Kreutzer from @cohereforai.bsky.social explores RL, test-time scaling & data distillation to improve open-ended tasks across languages. 🌍✨

#MELTWorkshop2025
October 10, 2025 at 6:27 PM
In the afternoon, you can find Julia at the MELT workshop (Multilingual and Equitable Language Technologies), where she will talk about optimizing multilinguality post training.
October 10, 2025 at 11:30 AM
And yea multilinguality is one thing extended family doesnt know.

From a practical perspective it doesnt mean much as I cant speak those yet.
October 7, 2025 at 9:24 AM
The British understanding of the benefits of multilinguality remains poor - quelle surprise!
Katie Lam says there are some schools in London "where 3/4 pupils don't speak English to the standard needed to for education"

Are there? London school results are outperforming everywhere else, partly + migration/diversity effects

[CPS livestreaming this if want to check the quote.It was 915am]
October 7, 2025 at 8:33 AM
We are the - mostly European, some Asian - part of the workforce in our company who have the same qualification as our US American colleagues but in addition above discussed better education, like multilinguality which makes us more versatile and more valuable? Really THAT difficult to comprehend?
October 7, 2025 at 12:42 AM
this guy doesn't speak or understand a word of Spanish and it multilinguality should be a basic job requirement for these brainless turds if we're gonna be forced to have this agency
September 26, 2025 at 1:06 AM
📢Life update📢

🥳I'm excited to share that I've started as a postdoc at Uppsala University NLP @uppsalanlp.bsky.social, working with Joakim Nivre on topics related to constructions and multilinguality!

🙏Many thanks to the Walter Benjamin Programme of the DFG for making this possible.
September 15, 2025 at 3:10 PM