Jindřich Libovický
jlibovicky.bsky.social
Jindřich Libovický
@jlibovicky.bsky.social
Researcher at Charles University | multilingual natural language processing, machine translation
Cultural awareness is trickier. Different data for different cultures means we can't really compare performance across cultures in a straightforward way. And there's no clear optimization target for cultural awareness beyond curating diverse training data.
October 21, 2025 at 1:30 PM
☝️🧵 Most current approaches emphasize langauge neutrality: about two-thirds of VL benchmarks use translation-based evaluation. This makes sense because we can explicitly train for language neutrality when we have parallel data. But... 🧵👇
October 21, 2025 at 1:30 PM
Most vision-language models only work in English. We explore how different parallel data types (machine-translated vs authentic captions) affect cross-lingual transfer. Key finding: authentic data can outperform machine translation, and multilingual training beats bilingual approaches. #NLP
September 1, 2025 at 3:38 PM
For evaluation researchers: Simple string-overlap metrics (BLEU, chrF) work surprisingly well for factual QA. 🤔 When answers are mostly named entities, exact matches matter more than we thought.

LLM-as-judge 🦙🧑‍⚖️ correlates best with human judgment, though.
August 25, 2025 at 8:06 AM
The results are... humbling 😅
Even the best models:

>40% accuracy on textual questions
<30% on visual questions
Often perform better in English than the local language (!!)

Visual QA with regional images is especially challenging.
August 25, 2025 at 8:06 AM
The problem: Most QA benchmarks focus on globally known facts. But real users ask about local geography, culture, and history.

We collected questions from native speakers in Czechia 🇨🇿, Slovakia 🇸🇰, and Ukraine 🇺🇦 about facts locals know but outsiders don't.
August 25, 2025 at 8:06 AM
If you will be on the virtual NAACL day on May 6, 5 pm Central European Time, don't miss @kathaem.bsky.social presenting our work on the importance of semantic token overlap in multilingual language models. aclanthology.org/2025.naacl-s...
Beyond Literal Token Overlap: Token Alignability for Multilinguality
Katharina Hämmerl, Tomasz Limisiewicz, Jindřich Libovický, Alexander Fraser. Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics:...
aclanthology.org
April 30, 2025 at 12:50 PM