Lightnews — Scholar-powered news

Jindřich Libovický

@jlibovicky.bsky.social

530 followers 220 following 29 posts

Researcher at Charles University | multilingual natural language processing, machine translation

Posts Replies Media Videos

Jindřich Libovický

@jlibovicky.bsky.social

Cultural awareness is trickier. Different data for different cultures means we can't really compare performance across cultures in a straightforward way. And there's no clear optimization target for cultural awareness beyond curating diverse training data.

October 21, 2025 at 1:30 PM

Jindřich Libovický

@jlibovicky.bsky.social

☝️🧵 Most current approaches emphasize langauge neutrality: about two-thirds of VL benchmarks use translation-based evaluation. This makes sense because we can explicitly train for language neutrality when we have parallel data. But... 🧵👇

October 21, 2025 at 1:30 PM

Jindřich Libovický

@jlibovicky.bsky.social

Most vision-language models only work in English. We explore how different parallel data types (machine-translated vs authentic captions) affect cross-lingual transfer. Key finding: authentic data can outperform machine translation, and multilingual training beats bilingual approaches. #NLP

September 1, 2025 at 3:38 PM

Jindřich Libovický

@jlibovicky.bsky.social

For evaluation researchers: Simple string-overlap metrics (BLEU, chrF) work surprisingly well for factual QA. 🤔 When answers are mostly named entities, exact matches matter more than we thought.

LLM-as-judge 🦙🧑‍⚖️ correlates best with human judgment, though.

August 25, 2025 at 8:06 AM

Jindřich Libovický

@jlibovicky.bsky.social

The results are... humbling 😅
Even the best models:

>40% accuracy on textual questions
<30% on visual questions
Often perform better in English than the local language (!!)

Visual QA with regional images is especially challenging.

August 25, 2025 at 8:06 AM

Jindřich Libovický

@jlibovicky.bsky.social

The problem: Most QA benchmarks focus on globally known facts. But real users ask about local geography, culture, and history.

We collected questions from native speakers in Czechia 🇨🇿, Slovakia 🇸🇰, and Ukraine 🇺🇦 about facts locals know but outsiders don't.

August 25, 2025 at 8:06 AM

Jindřich Libovický

@jlibovicky.bsky.social

If you will be on the virtual NAACL day on May 6, 5 pm Central European Time, don't miss @kathaem.bsky.social presenting our work on the importance of semantic token overlap in multilingual language models. aclanthology.org/2025.naacl-s...

Beyond Literal Token Overlap: Token Alignability for Multilinguality

Katharina Hämmerl, Tomasz Limisiewicz, Jindřich Libovický, Alexander Fraser. Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics:...

aclanthology.org

April 30, 2025 at 12:50 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news