Stefano
sted19.bsky.social
Stefano
@sted19.bsky.social
PhD Student @SapienzaNLP
Applied Scientist Intern @Amazon Madrid
A huge thanks to my fantastic co-authors: Lorenzo Proietti, @zouharvi.bsky.social, Roberto Navigli, and @kocmitom.bsky.social. 👏

#AI #NLProc #Evaluation
September 16, 2025 at 8:46 AM
🤖 We release our best models, sentinel-src-24 and sentinel-src-25! Use them to build more robust evaluations, filter data, or explore applications in other areas such as curriculum learning.
September 16, 2025 at 8:46 AM
🔍 Our most surprising finding? LLM-based methods struggle with this task, performing worse than even simple heuristics like sentence length. In contrast, our specialized, trained models are the clear winners.
September 16, 2025 at 8:46 AM
In our paper, we:
1️⃣ Define the task and introduce Difficulty Estimation Correlation to evaluate difficulty estimators.
2️⃣ Benchmark a wide range of methods establishing the first SOTA.
3️⃣ Demonstrate their effectiveness in building more challenging test sets automatically.
September 16, 2025 at 8:46 AM
💡Our solution: increase benchmark difficulty!

What if we could predict in advance which texts are hard to translate? We introduce Translation Difficulty Estimation as a novel task to automatically identify challenging texts for MT systems.
September 16, 2025 at 8:46 AM
🤖 We release our best models, sentinel-src-24 and sentinel-src-25! Use them to build more robust evaluations, filter data, or explore applications in other areas such as curriculum learning.
September 16, 2025 at 8:41 AM
🔍 Our most surprising finding? LLM-based methods struggle with this task, performing worse than even simple heuristics like sentence length. In contrast, our specialized, trained models are the clear winners.
September 16, 2025 at 8:41 AM
In our paper, we:

1️⃣ Define the task and introduce Difficulty Estimation Correlation to evaluate difficulty estimators.
2️⃣ Benchmark a wide range of methods establishing the first SOTA.
3️⃣ Demonstrate their effectiveness in building more challenging test sets automatically.
September 16, 2025 at 8:41 AM
💡Our solution: increase benchmark difficulty!

What if we could predict in advance which texts are hard to translate? We introduce Translation Difficulty Estimation as a novel task to automatically identify challenging texts for MT systems.
September 16, 2025 at 8:41 AM