Lightnews — Scholar-powered news

Stefano

@sted19.bsky.social

📄 Paper: arxiv.org/abs/2508.10175

🤗 Models: huggingface.co/collections/...

💻 Code: github.com/zouharvi/tra...

Estimating Machine Translation Difficulty

Machine translation quality has steadily improved over the years, achieving near-perfect translations in recent benchmarks. These high-quality outputs make it difficult to distinguish between state-of...

arxiv.org

September 16, 2025 at 8:46 AM

Stefano

@sted19.bsky.social

A huge thanks to my fantastic co-authors: Lorenzo Proietti, @zouharvi.bsky.social, Roberto Navigli, and @kocmitom.bsky.social. 👏

#AI #NLProc #Evaluation

September 16, 2025 at 8:46 AM

Stefano

@sted19.bsky.social

🤖 We release our best models, sentinel-src-24 and sentinel-src-25! Use them to build more robust evaluations, filter data, or explore applications in other areas such as curriculum learning.

September 16, 2025 at 8:46 AM

Stefano

@sted19.bsky.social

🔍 Our most surprising finding? LLM-based methods struggle with this task, performing worse than even simple heuristics like sentence length. In contrast, our specialized, trained models are the clear winners.

September 16, 2025 at 8:46 AM

Stefano

@sted19.bsky.social

In our paper, we:
1️⃣ Define the task and introduce Difficulty Estimation Correlation to evaluate difficulty estimators.
2️⃣ Benchmark a wide range of methods establishing the first SOTA.
3️⃣ Demonstrate their effectiveness in building more challenging test sets automatically.

September 16, 2025 at 8:46 AM

Stefano

@sted19.bsky.social

💡Our solution: increase benchmark difficulty!

What if we could predict in advance which texts are hard to translate? We introduce Translation Difficulty Estimation as a novel task to automatically identify challenging texts for MT systems.

September 16, 2025 at 8:46 AM

Stefano

@sted19.bsky.social

🤖 We release our best models, sentinel-src-24 and sentinel-src-25! Use them to build more robust evaluations, filter data, or explore applications in other areas such as curriculum learning.

September 16, 2025 at 8:41 AM

Stefano

@sted19.bsky.social

🔍 Our most surprising finding? LLM-based methods struggle with this task, performing worse than even simple heuristics like sentence length. In contrast, our specialized, trained models are the clear winners.

September 16, 2025 at 8:41 AM

Stefano

@sted19.bsky.social

In our paper, we:

1️⃣ Define the task and introduce Difficulty Estimation Correlation to evaluate difficulty estimators.
2️⃣ Benchmark a wide range of methods establishing the first SOTA.
3️⃣ Demonstrate their effectiveness in building more challenging test sets automatically.

September 16, 2025 at 8:41 AM

Stefano

@sted19.bsky.social

💡Our solution: increase benchmark difficulty!

What if we could predict in advance which texts are hard to translate? We introduce Translation Difficulty Estimation as a novel task to automatically identify challenging texts for MT systems.

September 16, 2025 at 8:41 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news