⚠️MLAMA is full of disfluent sentences
❓Reason: templated translation
💡Simple full-sentence translation improves factual retrieval up to 25%
🙌Remember to check your benchmarks with speakers!
Link: arxiv.org/pdf/2510.15115
⚠️MLAMA is full of disfluent sentences
❓Reason: templated translation
💡Simple full-sentence translation improves factual retrieval up to 25%
🙌Remember to check your benchmarks with speakers!
Link: arxiv.org/pdf/2510.15115
Highlights:
- sentence translation seems solvable, document translation is still challenging
- better systems benefit more from proper terminologies
- term-based metrics correlate poorly with general translation quality
www2.statmt.org/wmt25/pdf/20...
Highlights:
- sentence translation seems solvable, document translation is still challenging
- better systems benefit more from proper terminologies
- term-based metrics correlate poorly with general translation quality
www2.statmt.org/wmt25/pdf/20...
InCa and InDia: more stable and interpretable tokenizer preprocessing that handles casing and diacritization!
Check out our:
💻package: github.com/Kiryukhaseme...
🎥video: www.youtube.com/watch?v=XgDP...
📝paper: openreview.net/pdf?id=9GwVW...
InCa and InDia: more stable and interpretable tokenizer preprocessing that handles casing and diacritization!
Check out our:
💻package: github.com/Kiryukhaseme...
🎥video: www.youtube.com/watch?v=XgDP...
📝paper: openreview.net/pdf?id=9GwVW...
This year:
👉5 language pairs: EN->{ES, RU, DE, ZH},
👉2 tracks - sentence-level and doc-level translation,
👉authentic data from 2 domains: finance and IT!
www2.statmt.org/wmt25/termin...
Don't miss an opportunity - we only do it once in two years😏
This year:
👉5 language pairs: EN->{ES, RU, DE, ZH},
👉2 tracks - sentence-level and doc-level translation,
👉authentic data from 2 domains: finance and IT!
www2.statmt.org/wmt25/termin...
Don't miss an opportunity - we only do it once in two years😏