Lightnews — Scholar-powered news

Vilém Zouhar #EMNLP

@zouharvi.bsky.social

Let's talk about eval (automatic or human) and multilinguality at #EMNLP in Suzhou! 🇨🇳

- Efficient evaluation (Nov 5, 16:30, poster session 3)
- MT difficulty (Nov 7, 12:30, findings 3)
- COMET-poly (Nov 8, 11:00, WMT)

(DM to meet 🌿 )

October 28, 2025 at 9:45 AM

Vilém Zouhar #EMNLP

@zouharvi.bsky.social

Grateful to receive the Google PhD Fellowship in NLP! 🙂

I am not secretive about having applied to 4 similar fellowships during my PhD before and not succeeding. Still, refining my research statement (part of the application) helped me tremendously in finding out the...

inf.ethz.ch/news-and-eve...

Google PhD Fellowships 2025

Yutong Chen, Benedict Schlüter and Vilém Zouhar, all three of them doctoral students at the Department of Computer Science, have been awarded the Google PhD Fellowship. The programme was created to re...

inf.ethz.ch

October 24, 2025 at 12:32 PM

Vilém Zouhar #EMNLP

@zouharvi.bsky.social

📢 Announcing the First Workshop on Multilingual and Multicultural Evaluation (MME) at #EACL2026 🇲🇦

MME focuses on resources, metrics & methodologies for evaluating multilingual systems! multilingual-multicultural-evaluation.github.io

📅 Workshop Mar 24–29, 2026
🗓️ Submit by Dec 19, 2025

October 20, 2025 at 10:37 AM

Vilém Zouhar #EMNLP

@zouharvi.bsky.social

My two biggest take-aways are:
- Standard testsets are too easy (Figure 1).
- We can make testsets that are not easy (Figure 2). 😎

September 16, 2025 at 8:49 AM

Reposted by Vilém Zouhar #EMNLP

Tom Kocmi

@kocmitom.bsky.social

We saw increased momentum in participation growth this year: 36 unique teams competing to improve the performance of MT. Furthermore, we added collected outputs of 24 popular LLMs and online systems. Reaching 50 evaluated systems in our annual benchmark.

August 23, 2025 at 9:28 AM

Vilém Zouhar #EMNLP

@zouharvi.bsky.social

The 2025 MT Evaluation shared task brings together the strengths of the previous Metrics and Quality Estimation tasks under a single, unified evaluation framework.

The following tasks are now open for participants (deadline July 31st but participation has never been easier 🙂 ):

July 25, 2025 at 4:59 PM

Vilém Zouhar #EMNLP

@zouharvi.bsky.social

You have a budget to human-evaluate 100 inputs to your models, but your dataset is 10,000 inputs. Do not just pick 100 randomly!🙅

We can do better. "How to Select Datapoints for Efficient Human Evaluation of NLG Models?" shows how.🕵️
(random is still a devilishly good baseline)

July 15, 2025 at 1:03 PM

Vilém Zouhar #EMNLP

@zouharvi.bsky.social

TIL that since python3.4 there's default `statistics` module with things like mean, mode, quantiles, variance, covariance, correlations, zscore, and more!. No more needless numpy imports!

July 9, 2025 at 12:49 AM

Vilém Zouhar #EMNLP

@zouharvi.bsky.social

Past iterations of the Terminology Shared Task don't come anywhere near the data quality and evaluation scrutiny of this one. In the era of LLM-as-MTs, participation has never been easier!

Kirill Semenov @kiryukhasemenov.bsky.social · Jun 6

📣Take part in 3rd Terminology shared task @WMT!📣
This year:
👉5 language pairs: EN->{ES, RU, DE, ZH},
👉2 tracks - sentence-level and doc-level translation,
👉authentic data from 2 domains: finance and IT!

www2.statmt.org/wmt25/termin...

Don't miss an opportunity - we only do it once in two years😏

Terminology Translation Task

www2.statmt.org

July 7, 2025 at 2:34 PM

Vilém Zouhar #EMNLP

@zouharvi.bsky.social

Thank you for your response. I will keep my score.

July 3, 2025 at 6:50 PM

Vilém Zouhar #EMNLP

@zouharvi.bsky.social

For the longest time I've been using Google Translate as a gateway to explain machine translation concepts to people as it's a tool that everyone knows. Now I get to contribute over the summer. 🌞

If you're near Mountain View, let's talk evaluation. 📏

July 3, 2025 at 4:15 AM

Vilém Zouhar #EMNLP

@zouharvi.bsky.social

arxiv submission process got an update!
(still requires a manual bbl)

May 31, 2025 at 9:56 AM

Reposted by Vilém Zouhar #EMNLP

Gabriele Sarti

@gsarti.com

XCOMETs underperform because they do not match translators' subjective error annotation propensity. Using the granular p(error) value from XCOMET significantly boost their performance when calibration is possible → desirable for a fair evaluation 6/

May 30, 2025 at 2:28 PM

Reposted by Vilém Zouhar #EMNLP

Gabriele Sarti

@gsarti.com

Key takeaways for WQE evals:
1️⃣ Unsup. WQE shows promise (esp. uncertainty-based ones), interp approaches under-explored for MT
2️⃣ Calibration sets can help to ensure fair evaluations.
3️⃣ Use multiple annotators for robust rakings.

More info ➡️ arxiv.org/abs/2505.23183 8/8

Unsupervised Word-level Quality Estimation for Machine Translation Through the Lens of Annotators (Dis)agreement

Word-level quality estimation (WQE) aims to automatically identify fine-grained error spans in machine-translated outputs and has found many uses, including assisting translators during post-editing. ...

arxiv.org

May 30, 2025 at 2:28 PM

Reposted by Vilém Zouhar #EMNLP

Gabriele Sarti

@gsarti.com

📢 New paper: Can unsupervised metrics extracted from MT models detect their translation errors reliably? Do annotators even *agree* on what constitutes an error? 🧐

We compare uncertainty- and interp-based WQE metrics across 12 directions, with some surprising findings!

🧵 1/

May 30, 2025 at 2:28 PM

Vilém Zouhar #EMNLP

@zouharvi.bsky.social

incredible monetization opportunity

(this is a joke)

May 14, 2025 at 8:52 AM

Vilém Zouhar #EMNLP

@zouharvi.bsky.social

Ever looked down from a hot air balloon and despaired at how expensive it is to run thorough human evaluation of machine translation?
Fret no more and come tomorrow at 11:00 to Hall 3 #NAACL2025.

May 2, 2025 at 12:30 AM

Vilém Zouhar #EMNLP

@zouharvi.bsky.social

Being in a hot air balloon in Albuquerque really makes one ponder *how to efficiently pick the best translation candidate without running expensive evaluation metrics on all of them.*

See you tomorrow at 9:00 in Hall 3 #NAACL2025.

May 2, 2025 at 12:26 AM

Vilém Zouhar #EMNLP

@zouharvi.bsky.social

Multilinguality is happening at #NAACL2025

@crystinaz.bsky.social
@oxxoskeets.bsky.social
@dayeonki.bsky.social @onadegibert.bsky.social

April 30, 2025 at 11:18 PM

Vilém Zouhar #EMNLP

@zouharvi.bsky.social

Let's chat at #NAACL2025 about evaluation et al! ⚗️

April 22, 2025 at 7:35 AM

Vilém Zouhar #EMNLP

@zouharvi.bsky.social

(Automatic) span annotations are the future of evaluation and diagnosis in NLP! 🖊️

April 15, 2025 at 11:30 AM

Reposted by Vilém Zouhar #EMNLP

Zdeněk Kasner

@zdenekkasner.bsky.social

How do LLMs compare to human crowdworkers in annotating text spans? 🧑🤖

And how can span annotation help us with evaluating texts?

Find out in our new paper: llm-span-annotators.github.io

Arxiv: arxiv.org/abs/2504.08697

Large Language Models as Span Annotators

Website for the paper Large Language Models as Span Annotators

llm-span-annotators.github.io

April 15, 2025 at 11:10 AM

Vilém Zouhar #EMNLP

@zouharvi.bsky.social

In the true sense of the word I am humbled to have been rejected by a few fellowships this year.

April 11, 2025 at 7:21 AM

Vilém Zouhar #EMNLP

@zouharvi.bsky.social

Overall 3.0 (borderline reject) but with 43% tariff adjustments it's 4.5 (borderline award).

April 3, 2025 at 7:56 AM

Vilém Zouhar #EMNLP

@zouharvi.bsky.social

Panopticon, but instead of prison cells, it's a stack of Overleaf tabs. You can’t watch them all at once, but at any moment, you *could* be watching any of them. And they know it.

March 25, 2025 at 9:48 AM

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news