Cohere Labs
banner
cohereforai.bsky.social
Cohere Labs
@cohereforai.bsky.social
@Cohere.com's non-profit research lab and open science initiative that seeks to solve complex machine learning problems. Join us in exploring the unknown, together. https://cohere.com/research
... @markusfreitag.bsky.social, Roman Grundkiewicz, @yupenghou.bsky.social, @phikoehn.bsky.social, @juliakreutzer.bsky.social, Saab Mansour, @sted19.bsky.social, Lorenzo Proietti, Parker Riley, Eduardo Sánchez, @patuchen.bsky.social, Mariya Shmatova, @zouharvi.bsky.social
October 30, 2025 at 5:51 PM
You can find all details in our paper www2.statmt.org/wmt25/pdf/20... or discuss with us next week at the WMT Conference at #EMNLP2025.

Led by @kocmitom.bsky.social, Ekaterina Artemova, Eleftherios Avramidis, Eleftheria Briakou, @pinzhen.bsky.social, @mziizm.bsky.social...
www2.statmt.org
October 30, 2025 at 5:51 PM
⚖️ LLM-as-a-judge: mixed reliability.

Top systems reach ~95% pairwise accuracy open-ended and summarization tasks.
Smaller ones barely beat coin-flip territory at ~55%.
October 30, 2025 at 5:51 PM
🤖Naturalness is still a significant challenge.

Across open-ended generation and cross lingual summarization, the biggest weakness isn’t coherence or accuracy, but it is sounding like a native speaker. Many outputs still feel robotic or translated.
October 30, 2025 at 5:51 PM
🧠English isn’t always easiest.

Models like Gemini 2.5 Pro and Claude 4 sometimes did better in Korean, German, or Spanish than in English when solving reasoning tasks.
October 30, 2025 at 5:51 PM
🧩Linguistic reasoning remains the toughest nut. 🥥

Even top models scored below 50% on linguistic reasoning tasks, showing that structured linguistic deduction is still an open challenge.
October 30, 2025 at 5:51 PM
🌐 Language coverage matters.

Models don’t support all languages equally, and this skews rankings. Smaller open models especially struggle with broad coverage, affecting their aggregate ranking ⚠️
October 30, 2025 at 5:51 PM
🧩 Linguistic reasoning on unseen languages
📝 Open-ended generation testing naturalness and usefulness
📘 Cross-lingual summarization
🔁 Machine translation
🧑‍⚖️ LLM-as-a-Judge evaluating outputs of other models

All backed by human evals and public releases of data + outputs!
github.com/wmt-conferen...
October 30, 2025 at 5:51 PM
Cohere Labs x EMNLP 2025: "When Life Gives You Samples: The Benefits of Scaling up Inference Compute for Multilingual LLMs"

Congrats to authors Ammar Khairi, Daniel D'souza, Ye Shen, @juliakreutzer.bsky.social, @sarahooker.bsky.social

📜 arxiv.org/abs/2506.20544
October 29, 2025 at 6:31 PM
Cohere Labs x EMNLP 2025 "When Personalization Meets Reality: A Multi-Faceted Analysis of Personalized Preference Learning"

Congrats to authors Yijiang River Dong, @tiancheng.bsky.social, Yinhong Liu, Ahmet Üstün, Nigel Collier.

📜 arxiv.org/abs/2502.19158
October 29, 2025 at 6:31 PM
Cohere Labs x EMNLP 2025: "The State of Multilingual LLM Safety Research: From Measuring The Language Gap To Mitigating It"

Congrats to authors @yongzx.bsky.social , Beyza Ermis, @mziizm.bsky.social, Stephen Bach, @juliakreutzer.bsky.social.

📜 arxiv.org/abs/2505.24119
October 29, 2025 at 6:31 PM
Cohere Labs x EMNLP 2025: "Nexus: Adaptive Upcycling to Efficiently Pretrain Mixture of Experts"

Congrats to authors Nikolas Gritsch, Qizhen Zhang, @acyrl.bsky.social, @sarahooker.bsky.social and Ahmet Üstün.

📜 arxiv.org/abs/2408.15901
October 29, 2025 at 6:31 PM
We're excited to hear from speakers including Ivan Zhang, Joelle Pineau, Marzieh Fadaee, Shayne Longpre and 20+ other presenters who will share insights on open science, collaborative research, and community-driven innovation.

Learn more and register now: https://tinyurl.com/CohereLabsConnect
October 24, 2025 at 10:00 AM
Join us for inspiring keynotes, lightning talks, and interactive sessions that bring together curious minds from around the world. Throughout the conference, we’ll:

🔬 Showcase cutting-edge research
💡 Highlight meaningful collaborations
🤝 Inspire new partnerships
October 24, 2025 at 10:00 AM
arxiv.org
October 23, 2025 at 2:45 PM
With this work we take a step toward principled approaches to multilingual synthetic data generation—an essential direction for developing adaptive, culturally aware, and globally capable language models. 🚀
October 23, 2025 at 2:39 PM
We also evaluated our method on languages not seen during pre-training🌍: while performance is higher for seen languages, our transformations significantly improve both groups over the baseline—and in some cases are competitive with the teacher model📈(over 3x the student’s size).
October 23, 2025 at 2:39 PM
📊 By inspecting the data itself, we see clear gains in quality along the targeted dimensions. Even when the interventions are relatively small, they produce substantial changes in completions improving their fluency, diversity, and difficulty ✨
October 23, 2025 at 2:39 PM
⛰️With these simple transformations, we’re able to obtain consistent improvements across our 12 target languages and a diverse set of benchmarks, with particularly pronounced gains on open-ended tasks — our best proxies for real human use 💬
October 23, 2025 at 2:39 PM
Only relying on translation often yields unnatural, Western-centric, and linguistically flat prompts.
💡We propose a simple, easy-to-implement solution to this problem:
🌐Transform translated prompts along three axes: Naturalization, Cultural Adaptation, and Difficulty.
October 23, 2025 at 2:39 PM