Cohere Labs
@cohereforai.bsky.social
@Cohere.com's non-profit research lab and open science initiative that seeks to solve complex machine learning problems. Join us in exploring the unknown, together. https://cohere.com/research
Pinned
Cohere Labs
@cohereforai.bsky.social
· Jan 15
We are committed to making meaningful progress in machine learning research through open collaboration. Follow this 🧵to stay on top of our research contributions.
How well do LLMs handle multilinguality? 🌍🤖
🔬We brought the rigor from Machine Translation evaluation to multilingual LLM benchmarking and organized the WMT25 Multilingual Instruction Shared Task spanning 30 languages and 5 subtasks.
🔬We brought the rigor from Machine Translation evaluation to multilingual LLM benchmarking and organized the WMT25 Multilingual Instruction Shared Task spanning 30 languages and 5 subtasks.
October 30, 2025 at 5:51 PM
How well do LLMs handle multilinguality? 🌍🤖
🔬We brought the rigor from Machine Translation evaluation to multilingual LLM benchmarking and organized the WMT25 Multilingual Instruction Shared Task spanning 30 languages and 5 subtasks.
🔬We brought the rigor from Machine Translation evaluation to multilingual LLM benchmarking and organized the WMT25 Multilingual Instruction Shared Task spanning 30 languages and 5 subtasks.
Reposted by Cohere Labs
River, Yinhong and I will all be in person and we look forward to the discussions!
Cohere Labs x EMNLP 2025 "When Personalization Meets Reality: A Multi-Faceted Analysis of Personalized Preference Learning"
Congrats to authors Yijiang River Dong, @tiancheng.bsky.social, Yinhong Liu, Ahmet Üstün, Nigel Collier.
📜 arxiv.org/abs/2502.19158
Congrats to authors Yijiang River Dong, @tiancheng.bsky.social, Yinhong Liu, Ahmet Üstün, Nigel Collier.
📜 arxiv.org/abs/2502.19158
October 29, 2025 at 9:12 PM
River, Yinhong and I will all be in person and we look forward to the discussions!
We’re thrilled to announce that some of our research will be presented at @emnlpmeeting.bsky.social next week! 🥳
If you’re attending the conference, don’t miss the chance to explore our work and connect with our team.
If you’re attending the conference, don’t miss the chance to explore our work and connect with our team.
October 29, 2025 at 6:31 PM
We’re thrilled to announce that some of our research will be presented at @emnlpmeeting.bsky.social next week! 🥳
If you’re attending the conference, don’t miss the chance to explore our work and connect with our team.
If you’re attending the conference, don’t miss the chance to explore our work and connect with our team.
“Individually, we are one drop. Together, we are an ocean.” - Ryunosuke Satoro ✨
Cohere Labs is excited to announce Connect - a 3-day virtual conference celebrating the power of collaboration in open science!
Cohere Labs is excited to announce Connect - a 3-day virtual conference celebrating the power of collaboration in open science!
October 24, 2025 at 10:00 AM
“Individually, we are one drop. Together, we are an ocean.” - Ryunosuke Satoro ✨
Cohere Labs is excited to announce Connect - a 3-day virtual conference celebrating the power of collaboration in open science!
Cohere Labs is excited to announce Connect - a 3-day virtual conference celebrating the power of collaboration in open science!
🌍Most multilingual instruction data starts as English and translation can’t capture cultural nuance or linguistic richness
What if we optimized prompts instead of completions?
That’s the focus of our most recent work on prompt space optimization for multilingual synthetic data🗣️
What if we optimized prompts instead of completions?
That’s the focus of our most recent work on prompt space optimization for multilingual synthetic data🗣️
October 23, 2025 at 2:39 PM
🌍Most multilingual instruction data starts as English and translation can’t capture cultural nuance or linguistic richness
What if we optimized prompts instead of completions?
That’s the focus of our most recent work on prompt space optimization for multilingual synthetic data🗣️
What if we optimized prompts instead of completions?
That’s the focus of our most recent work on prompt space optimization for multilingual synthetic data🗣️
Reposted by Cohere Labs
🚀 Global MMLU Lite is now live on Kaggle Benchmarks!
Developed by @cohereforai.bsky.social, it spans 16 languages with both Culturally Sensitive & Agnostic samples - helping researchers uncover cultural & linguistic biases in multilingual evaluation.
Developed by @cohereforai.bsky.social, it spans 16 languages with both Culturally Sensitive & Agnostic samples - helping researchers uncover cultural & linguistic biases in multilingual evaluation.
Global MMLU Lite Leaderboard | Kaggle
Understanding and Addressing Cultural and Linguistic Biases in Multilingual Evaluation.
www.kaggle.com
October 17, 2025 at 4:18 PM
🚀 Global MMLU Lite is now live on Kaggle Benchmarks!
Developed by @cohereforai.bsky.social, it spans 16 languages with both Culturally Sensitive & Agnostic samples - helping researchers uncover cultural & linguistic biases in multilingual evaluation.
Developed by @cohereforai.bsky.social, it spans 16 languages with both Culturally Sensitive & Agnostic samples - helping researchers uncover cultural & linguistic biases in multilingual evaluation.
Global AI deserves reproducible and transparent evaluation. 🌎 With Global MMLU Lite now part of @kaggle.com Benchmarks, you can track the multilingual performance of top models as well as test your own!
Check out the leaderboard and notebook linked below.
Check out the leaderboard and notebook linked below.
October 17, 2025 at 4:00 PM
Global AI deserves reproducible and transparent evaluation. 🌎 With Global MMLU Lite now part of @kaggle.com Benchmarks, you can track the multilingual performance of top models as well as test your own!
Check out the leaderboard and notebook linked below.
Check out the leaderboard and notebook linked below.
This month, we've been very excited to welcome
Joelle Pineau, @cohere.com's new Chief AI Officer.
We look forward to working together on frontier research - advancing the science of building models that are robust, capable, and impactful in the real world.
Joelle Pineau, @cohere.com's new Chief AI Officer.
We look forward to working together on frontier research - advancing the science of building models that are robust, capable, and impactful in the real world.
October 16, 2025 at 2:19 PM
This month, we've been very excited to welcome
Joelle Pineau, @cohere.com's new Chief AI Officer.
We look forward to working together on frontier research - advancing the science of building models that are robust, capable, and impactful in the real world.
Joelle Pineau, @cohere.com's new Chief AI Officer.
We look forward to working together on frontier research - advancing the science of building models that are robust, capable, and impactful in the real world.
Reposted by Cohere Labs
Keynote talk: Optimizing Multilinguality Post Training.
Can multilingual ability be boosted at post training?
Julia Kreutzer from @cohereforai.bsky.social explores RL, test-time scaling & data distillation to improve open-ended tasks across languages. 🌍✨
#MELTWorkshop2025
Can multilingual ability be boosted at post training?
Julia Kreutzer from @cohereforai.bsky.social explores RL, test-time scaling & data distillation to improve open-ended tasks across languages. 🌍✨
#MELTWorkshop2025
October 10, 2025 at 6:27 PM
Keynote talk: Optimizing Multilinguality Post Training.
Can multilingual ability be boosted at post training?
Julia Kreutzer from @cohereforai.bsky.social explores RL, test-time scaling & data distillation to improve open-ended tasks across languages. 🌍✨
#MELTWorkshop2025
Can multilingual ability be boosted at post training?
Julia Kreutzer from @cohereforai.bsky.social explores RL, test-time scaling & data distillation to improve open-ended tasks across languages. 🌍✨
#MELTWorkshop2025
Today at COLM, Cohere Labs Sr Research Scientist, @juliakreutzer.bsky.social will be presenting at 2 workshops.
First, the Multilingual Data Quality Signals workshop, bringing together researchers across disciplines to discuss & present research on data quality signals in multilingual data.
First, the Multilingual Data Quality Signals workshop, bringing together researchers across disciplines to discuss & present research on data quality signals in multilingual data.
October 10, 2025 at 11:30 AM
Today at COLM, Cohere Labs Sr Research Scientist, @juliakreutzer.bsky.social will be presenting at 2 workshops.
First, the Multilingual Data Quality Signals workshop, bringing together researchers across disciplines to discuss & present research on data quality signals in multilingual data.
First, the Multilingual Data Quality Signals workshop, bringing together researchers across disciplines to discuss & present research on data quality signals in multilingual data.
Today at COLM, we are excited to share our work Déjà Vu: Multilingual LLM Evaluation through the Lens of Machine Translation Evaluation, during Poster Session 4, 4:30 - 6:30pm.
Come connect with paper authors @juliakreutzer.bsky.social and @kocmitom.bsky.social.
Come connect with paper authors @juliakreutzer.bsky.social and @kocmitom.bsky.social.
October 8, 2025 at 11:30 AM
Today at COLM, we are excited to share our work Déjà Vu: Multilingual LLM Evaluation through the Lens of Machine Translation Evaluation, during Poster Session 4, 4:30 - 6:30pm.
Come connect with paper authors @juliakreutzer.bsky.social and @kocmitom.bsky.social.
Come connect with paper authors @juliakreutzer.bsky.social and @kocmitom.bsky.social.
Reposted by Cohere Labs
💡A collaborative➕diverse team is key. In real life as in the LLM world 💪🦾
Check out our latest work that builds on this insight. 👇
Check out our latest work that builds on this insight. 👇
Is Best-of-N really the best use of your inference compute?
Introducing Fusion-of-N: a simple and powerful way to advance inference and distillation beyond Best-of-N.
Introducing Fusion-of-N: a simple and powerful way to advance inference and distillation beyond Best-of-N.
October 2, 2025 at 2:10 PM
💡A collaborative➕diverse team is key. In real life as in the LLM world 💪🦾
Check out our latest work that builds on this insight. 👇
Check out our latest work that builds on this insight. 👇
Is Best-of-N really the best use of your inference compute?
Introducing Fusion-of-N: a simple and powerful way to advance inference and distillation beyond Best-of-N.
Introducing Fusion-of-N: a simple and powerful way to advance inference and distillation beyond Best-of-N.
October 2, 2025 at 10:00 AM
Is Best-of-N really the best use of your inference compute?
Introducing Fusion-of-N: a simple and powerful way to advance inference and distillation beyond Best-of-N.
Introducing Fusion-of-N: a simple and powerful way to advance inference and distillation beyond Best-of-N.
We’re not your average lab. We’re a hybrid research environment dedicated to revolutionizing the ML space.
And we’re hiring a Senior Research Scientist to co-create with us.
If you believe in research as a shared, global effort — this is your chance.
And we’re hiring a Senior Research Scientist to co-create with us.
If you believe in research as a shared, global effort — this is your chance.
September 30, 2025 at 10:00 AM
We’re not your average lab. We’re a hybrid research environment dedicated to revolutionizing the ML space.
And we’re hiring a Senior Research Scientist to co-create with us.
If you believe in research as a shared, global effort — this is your chance.
And we’re hiring a Senior Research Scientist to co-create with us.
If you believe in research as a shared, global effort — this is your chance.
What if the way we verify synthetic code is limiting model performance?
In our latest work we uncover the Verification Ceiling Problem: strict “all tests must pass” rules throw away useful data, while weak tests let errors through.
In our latest work we uncover the Verification Ceiling Problem: strict “all tests must pass” rules throw away useful data, while weak tests let errors through.
September 29, 2025 at 10:00 AM
What if the way we verify synthetic code is limiting model performance?
In our latest work we uncover the Verification Ceiling Problem: strict “all tests must pass” rules throw away useful data, while weak tests let errors through.
In our latest work we uncover the Verification Ceiling Problem: strict “all tests must pass” rules throw away useful data, while weak tests let errors through.
Reposted by Cohere Labs
I'm excited to share that I'll be stepping into the role of Head of @cohereforai.bsky.social. It's an honor and a responsibility to lead such an extraordinary group of researchers pushing the boundaries of AI research.
September 5, 2025 at 5:26 PM
I'm excited to share that I'll be stepping into the role of Head of @cohereforai.bsky.social. It's an honor and a responsibility to lead such an extraordinary group of researchers pushing the boundaries of AI research.
Reposted by Cohere Labs
Papers In The Park 14. Last one of the season! Still great weather. Surprising. Anthony is presenting the “Why Language Models Hallucinate”.
Thanks to @cohereforai.bsky.social for the copies and pizza.
Thanks to @cohereforai.bsky.social for the copies and pizza.
September 13, 2025 at 4:15 PM
Papers In The Park 14. Last one of the season! Still great weather. Surprising. Anthony is presenting the “Why Language Models Hallucinate”.
Thanks to @cohereforai.bsky.social for the copies and pizza.
Thanks to @cohereforai.bsky.social for the copies and pizza.
🚨 Rare opportunity: Cohere Labs is hiring a Research Scientist!
If you’re passionate about studying fundamental AI problems and working in a globally collaborative, open-science environment, this is for you.
Apply here: jobs.ashbyhq.com/cohere/7ec9e...
If you’re passionate about studying fundamental AI problems and working in a globally collaborative, open-science environment, this is for you.
Apply here: jobs.ashbyhq.com/cohere/7ec9e...
September 24, 2025 at 2:30 PM
🚨 Rare opportunity: Cohere Labs is hiring a Research Scientist!
If you’re passionate about studying fundamental AI problems and working in a globally collaborative, open-science environment, this is for you.
Apply here: jobs.ashbyhq.com/cohere/7ec9e...
If you’re passionate about studying fundamental AI problems and working in a globally collaborative, open-science environment, this is for you.
Apply here: jobs.ashbyhq.com/cohere/7ec9e...
Reposted by Cohere Labs
It’s papers in the park 7! Thanks to @cohereforai.bsky.social for the papers and the pizza, and to Alvin and Anthony for organizing.
It’s easily one of funnest paper reads in the city!
It’s easily one of funnest paper reads in the city!
July 26, 2025 at 3:32 PM
It’s papers in the park 7! Thanks to @cohereforai.bsky.social for the papers and the pizza, and to Alvin and Anthony for organizing.
It’s easily one of funnest paper reads in the city!
It’s easily one of funnest paper reads in the city!
Reposted by Cohere Labs
Breaking into AI research is harder than ever, and early-career researchers face fewer chances to get started.
Entry points matter.
We started the Scholars Program 3 years ago to give new researchers a real shot — excited to open applications for year 4✨
Entry points matter.
We started the Scholars Program 3 years ago to give new researchers a real shot — excited to open applications for year 4✨
Applications are now open for the next cohort of the Cohere Labs Scholars Program! 🌟
This is your chance to collaborate with some of the brightest minds in AI & chart new courses in ML research. Let's change the spaces breakthroughs happen.
Apply by Aug 29.
This is your chance to collaborate with some of the brightest minds in AI & chart new courses in ML research. Let's change the spaces breakthroughs happen.
Apply by Aug 29.
August 13, 2025 at 2:42 PM
Breaking into AI research is harder than ever, and early-career researchers face fewer chances to get started.
Entry points matter.
We started the Scholars Program 3 years ago to give new researchers a real shot — excited to open applications for year 4✨
Entry points matter.
We started the Scholars Program 3 years ago to give new researchers a real shot — excited to open applications for year 4✨
While effective for chess♟️, Elo ratings struggle with LLM evaluation due to volatility and transitivity issues.
New post in collaboration with AI Singapore explores why Elo falls short for AI leaderboards and how we can do better.
New post in collaboration with AI Singapore explores why Elo falls short for AI leaderboards and how we can do better.
August 15, 2025 at 5:04 AM
While effective for chess♟️, Elo ratings struggle with LLM evaluation due to volatility and transitivity issues.
New post in collaboration with AI Singapore explores why Elo falls short for AI leaderboards and how we can do better.
New post in collaboration with AI Singapore explores why Elo falls short for AI leaderboards and how we can do better.
Applications are now open for the next cohort of the Cohere Labs Scholars Program! 🌟
This is your chance to collaborate with some of the brightest minds in AI & chart new courses in ML research. Let's change the spaces breakthroughs happen.
Apply by Aug 29.
This is your chance to collaborate with some of the brightest minds in AI & chart new courses in ML research. Let's change the spaces breakthroughs happen.
Apply by Aug 29.
August 13, 2025 at 1:32 PM
Applications are now open for the next cohort of the Cohere Labs Scholars Program! 🌟
This is your chance to collaborate with some of the brightest minds in AI & chart new courses in ML research. Let's change the spaces breakthroughs happen.
Apply by Aug 29.
This is your chance to collaborate with some of the brightest minds in AI & chart new courses in ML research. Let's change the spaces breakthroughs happen.
Apply by Aug 29.
Can we improve the performance of LLMs during inference without the need for extensive sampling OR special reward models? 🤔
Our latest work introduces a new inference time scaling recipe that is sample-efficient, multilingual, and suitable for multi-task requirements. 🍋
Our latest work introduces a new inference time scaling recipe that is sample-efficient, multilingual, and suitable for multi-task requirements. 🍋
June 26, 2025 at 4:33 PM
Can we improve the performance of LLMs during inference without the need for extensive sampling OR special reward models? 🤔
Our latest work introduces a new inference time scaling recipe that is sample-efficient, multilingual, and suitable for multi-task requirements. 🍋
Our latest work introduces a new inference time scaling recipe that is sample-efficient, multilingual, and suitable for multi-task requirements. 🍋
It’s been two years since cross-lingual jailbreaks were first discovered. How far has the multilingual LLM safety research field advanced? 🤔
📏 Our comprehensive survey reveals that there is still a long way to go.
📏 Our comprehensive survey reveals that there is still a long way to go.
June 3, 2025 at 1:59 PM
It’s been two years since cross-lingual jailbreaks were first discovered. How far has the multilingual LLM safety research field advanced? 🤔
📏 Our comprehensive survey reveals that there is still a long way to go.
📏 Our comprehensive survey reveals that there is still a long way to go.