Lightnews — Scholar-powered news

Tom Kocmi

@kocmitom.bsky.social

This project wouldn’t have been possible without the brilliant minds driving the work: Lorenzo Proietti, @sted19.bsky.social and @zouharvi.bsky.social

September 16, 2025 at 9:51 AM

Tom Kocmi

@kocmitom.bsky.social

One way to raise the bar is by rethinking the source selection process: instead of random samples, we built model that chooses the most difficult data for translation. And we’ve already put our work into practice: this year’s WMT25 General MT test set use our approach to make eval more challenging.

September 16, 2025 at 9:51 AM

Tom Kocmi

@kocmitom.bsky.social

Oh, and the best part: we’re releasing the weights so researchers can run wild with it. Stay tuned for our upcoming technical report!

cohere.com/blog/command...

Command A Translate: Secure translation for global enterprises

The new industry standard for secure, enterprise-ready machine translation.

cohere.com

August 28, 2025 at 7:55 PM

Tom Kocmi

@kocmitom.bsky.social

A correction: we obtained 22 multilingual systems while in contrast we got only 14 bilingual systems, highlighting a shift in the field towards multilinguality.

August 26, 2025 at 7:06 PM

Tom Kocmi

@kocmitom.bsky.social

We received 14 specialized systems while 10 are multilingual. And almost all participants finetuned some LLMs.

In contrast to previous years, constrained systems are now reaching top-tier rankings, challenging the dominance of unconstrained ones.

Stay tuned for the 20th anniversary WMT conference.

August 23, 2025 at 9:28 AM

Tom Kocmi

@kocmitom.bsky.social

We saw increased momentum in participation growth this year: 36 unique teams competing to improve the performance of MT. Furthermore, we added collected outputs of 24 popular LLMs and online systems. Reaching 50 evaluated systems in our annual benchmark.

August 23, 2025 at 9:28 AM

Tom Kocmi

@kocmitom.bsky.social

* Revamped constrained track – No restrictions on training data except licensing; all open models under 20B parameters are allowed.

* More challenging sources; long-context translation; prompt preambles; and much more.

📌 All details are available at www2.statmt.org/wmt25/transl...

Shared Task: General Machine Translation

www2.statmt.org

February 20, 2025 at 9:31 PM

Tom Kocmi

@kocmitom.bsky.social

* New human-evaluated language pairs: EN–Arabic, EN–Estonian, EN–Korean, EN–Serbian, Czech–German, Bhojpuri–EN, Maasai–EN

* New multilingual subtask – Can you build a system that translates 30 languages?

* New modalities – Additional context from video and image (text-to-text remains the core).

February 20, 2025 at 9:31 PM

Tom Kocmi

@kocmitom.bsky.social

Yeah, I haven't wrote a paper since it's just a different prompt. It's published in the github repository of GEMBA

February 9, 2025 at 10:14 AM

Tom Kocmi

@kocmitom.bsky.social

That one is extremely large, but we haven't used it either in the automatic ranking. Unfortunately I'm not aware of any API service for metrics

February 8, 2025 at 11:44 AM

Tom Kocmi

@kocmitom.bsky.social

🙏 A huge thank you to all organizers, partners, and participants for making this year's WMT General MT Shared Task a success! Stay tuned for WMT25 - many exciting changes are coming! 🎉

November 20, 2024 at 10:16 AM

Tom Kocmi

@kocmitom.bsky.social

🏆 Highlights from top systems:
✅ IOL-Research: led in constrained/open, winning 10/11 in its category.
✅ Unbabel-Tower70B: Best participant, winning 8/11 pairs.
✅ Claude-3.5-Sonnet: Best overall with 9/11 wins.
✅ Shoutout to Dubformer (speech) & CUNI-MH (strong constrained)

November 20, 2024 at 10:16 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news