Tom Kocmi
kocmitom.bsky.social
Tom Kocmi
@kocmitom.bsky.social
Researcher at Cohere | Multilingual LLM evaluation
This project wouldn’t have been possible without the brilliant minds driving the work: Lorenzo Proietti, @sted19.bsky.social and @zouharvi.bsky.social
September 16, 2025 at 9:51 AM
One way to raise the bar is by rethinking the source selection process: instead of random samples, we built model that chooses the most difficult data for translation. And we’ve already put our work into practice: this year’s WMT25 General MT test set use our approach to make eval more challenging.
September 16, 2025 at 9:51 AM
Oh, and the best part: we’re releasing the weights so researchers can run wild with it. Stay tuned for our upcoming technical report!

cohere.com/blog/command...
Command A Translate: Secure translation for global enterprises
The new industry standard for secure, enterprise-ready machine translation.
cohere.com
August 28, 2025 at 7:55 PM
A correction: we obtained 22 multilingual systems while in contrast we got only 14 bilingual systems, highlighting a shift in the field towards multilinguality.
August 26, 2025 at 7:06 PM
We received 14 specialized systems while 10 are multilingual. And almost all participants finetuned some LLMs.

In contrast to previous years, constrained systems are now reaching top-tier rankings, challenging the dominance of unconstrained ones.

Stay tuned for the 20th anniversary WMT conference.
August 23, 2025 at 9:28 AM
We saw increased momentum in participation growth this year: 36 unique teams competing to improve the performance of MT. Furthermore, we added collected outputs of 24 popular LLMs and online systems. Reaching 50 evaluated systems in our annual benchmark.
August 23, 2025 at 9:28 AM
* Revamped constrained track – No restrictions on training data except licensing; all open models under 20B parameters are allowed.

* More challenging sources; long-context translation; prompt preambles; and much more.

📌 All details are available at www2.statmt.org/wmt25/transl...
Shared Task: General Machine Translation
www2.statmt.org
February 20, 2025 at 9:31 PM
* New human-evaluated language pairs: EN–Arabic, EN–Estonian, EN–Korean, EN–Serbian, Czech–German, Bhojpuri–EN, Maasai–EN

* New multilingual subtask – Can you build a system that translates 30 languages?

* New modalities – Additional context from video and image (text-to-text remains the core).
February 20, 2025 at 9:31 PM
Yeah, I haven't wrote a paper since it's just a different prompt. It's published in the github repository of GEMBA
February 9, 2025 at 10:14 AM
That one is extremely large, but we haven't used it either in the automatic ranking. Unfortunately I'm not aware of any API service for metrics
February 8, 2025 at 11:44 AM
🙏 A huge thank you to all organizers, partners, and participants for making this year's WMT General MT Shared Task a success! Stay tuned for WMT25 - many exciting changes are coming! 🎉
November 20, 2024 at 10:16 AM
🏆 Highlights from top systems:
✅ IOL-Research: led in constrained/open, winning 10/11 in its category.
✅ Unbabel-Tower70B: Best participant, winning 8/11 pairs.
✅ Claude-3.5-Sonnet: Best overall with 9/11 wins.
✅ Shoutout to Dubformer (speech) & CUNI-MH (strong constrained)
November 20, 2024 at 10:16 AM