Lightnews — Scholar-powered news

Vitalii Hirak

@v-hirak.bsky.social

PhD student in Natural Language Processing and Information Retrieval at University at Düsseldorf. Working in the EmergentIR project at GESIS Cologne.

Posts Replies Media Videos

Vitalii Hirak

@v-hirak.bsky.social

6/6: We hope our work will inspire further research on the intrinsic difficulty of translating and generating different languages in the age of LLMs, particularly through experimentation with alternative decoding strategies.

For now, I'm looking forward to presenting our work in Rabat, Morocco 🇲🇦

February 8, 2026 at 4:56 PM

Vitalii Hirak

@v-hirak.bsky.social

5/6: In the context of
searching for the model’s highest-probability translation, we found that languages with more complex morphology and flexible word order benefit more from wider beam size.

In other words, the standard practice of left-to-right beam search may be suboptimal for these languages.

February 8, 2026 at 4:56 PM

Vitalii Hirak

@v-hirak.bsky.social

4/6: Through correlation and regression experiments, we found that language properties like typological distance, type/token ratio, and head-finality drive translation quality of both NMT models, even after controlling for more trivial factors such as language resourcedness and script similarity.

Spearman correlations between continuous language properties and NLLB-200 chrF++ translation quality scores at beam size k = 5. Source language is English. Sample sizes (i.e. number of target languages) for each property are indicated next to their respective bars. Correlations significant at p < 0.05 are marked with *, at p < 0.01 with **, at p < 0.001 with ***.

February 8, 2026 at 4:56 PM

Vitalii Hirak

@v-hirak.bsky.social

3/6: We analyze 2 NMT models, NLLB-200 and Tower+.

Although current SOTA has shifted to prompting decoder-only LLMs such as Tower+, we find that NLLB achieves higher chrF++ scores on all languages outside Tower's coverage, reaffirming the relevance of encoder-decoders for low-resourced languages.

Tower+ 9B chrF++ scores vs. NLLB-200 3.3B chrF++ scores at beam size k = 7. Each point denotes a language pair and is colored by source language, while ▼ denotes target languages officially supported
by Tower+. The blue and orange shaded regions indicate language pairs for which either NLLB-200 or Tower+ scores are higher, respectively. Sample size is n = 7 × 52 = 364.

February 8, 2026 at 4:56 PM

Vitalii Hirak

@v-hirak.bsky.social

2/6: First, we compile a broad set of fine-grained typological and morphosyntactic features for 212 languages in the FLORES+ MT benchmark. We release this set publicly: github.com/v-hirak/expl...

February 8, 2026 at 4:56 PM

Vitalii Hirak

@v-hirak.bsky.social

Henry Cavill is a creep though

June 17, 2025 at 10:54 AM

Vitalii Hirak

@v-hirak.bsky.social

They aren't canonizing anything, this show is gonna be as canon as the millions of other people's playthroughs. It's just their take on the story

June 17, 2025 at 10:53 AM

Vitalii Hirak

@v-hirak.bsky.social

Thank you from a Ukrainian, Kala, sincerely 🙏 I love your Mass Effect content

February 28, 2025 at 10:34 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news