cointegrated.bsky.social
@cointegrated.bsky.social
Adding a bunch of tags for discoverability: #machinetranslation #flores #seed #languages #multilinguality #ai #nlp #mt
July 5, 2025 at 1:18 PM
The Seed training dataset also received a few submissions, including new translations into Spanish and Italian (from which it might be easier to translate into lower-resourced languages).
July 5, 2025 at 1:17 PM
BTW, last year, as part of the previous shared task (aclanthology.org/2024.wmt-1.4), FLORES+ was extended with the languages Emakhuwa, Erzya, Tuvan, Karakalpak, Aragonese, Aranese, Asturian, Valencian, and Wu Chinese, and received a number of edits to other languages.
July 5, 2025 at 1:16 PM
What to do now?
- Download the dataset and benchmark multilingual models: huggingface.co/datasets/ope...
- Subscribe to our newsletter: openlanguagedata.substack.com/about
- Participate in the WMT25 Open Data shared task to enrich open datasets with new languages www2.statmt.org/wmt25/open-d...
July 5, 2025 at 1:15 PM