Lightnews — Scholar-powered news

Jindřich Libovický

@jlibovicky.bsky.social

Happy holidays! 🎄🎅🤩🎁

December 23, 2025 at 10:51 AM

Jindřich Libovický

@jlibovicky.bsky.social

Attenzione! 🇮🇹 Know Piedmontese or Neapolitan speakers? @gianlucavico.bsky.social is collecting crowd-sourced translations to evaluate LLM performance on these regional languages. Partecipate!

Gianluca Vico @gianlucavico.bsky.social · Nov 10

We’re collecting crowd-sourced translations in Piedmontese and Neapolitan.
🎯 Goal: see how well LLMs understand these languages.
👉 Participate here (in IT🇮🇹):
- Piedmontese: quest.ms.mff.cuni.cz/crowd-transl...
- Neapolitan: quest.ms.mff.cuni.cz/crowd-transl...
Anyone can join, no need to be fluent!

Welcome to CrowdTranslation

quest.ms.mff.cuni.cz

November 10, 2025 at 2:36 PM

Jindřich Libovický

@jlibovicky.bsky.social

With @andrei-a-manea.bsky.social, we posted a survey on multilingual vision-language models 👉 arxiv.org/pdf/2509.22123
We reviewed 31 models+21 benchmarks. There's a tension between language neutrality (same results across languages) & cultural awareness (context matters differently across cultures)

arxiv.org

October 21, 2025 at 1:30 PM

Jindřich Libovický

@jlibovicky.bsky.social

So proud of my PhD student @andrei-a-manea.bsky.social for his first first-author publication! 🎉 He presented this work last week at TSD. Investigating the Effect of Parallel Data in the Cross-Lingual Transfer for Vision-Language Encoders arxiv.org/pdf/2504.21681

September 1, 2025 at 3:38 PM

Jindřich Libovický

@jlibovicky.bsky.social

🧵 We're releasing CUS-QA - a new benchmark for testing LLMs on regional knowledge!
Find out what your model knows about Czechia 🇨🇿, Slovakia 🇸🇰, and Ukraine 🇺🇦!
👉 Textual and visual questions, answers, and human judgment on model outputs!
huggingface.co/datasets/ufa...
www.arxiv.org/abs/2507.22752

ufal/cus-qa · Datasets at Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

August 25, 2025 at 8:06 AM

Jindřich Libovický

@jlibovicky.bsky.social

Stay tuned, we will release the dataset soon...

Institute of Formal and Applied Linguistics @ufal.mff.cuni.cz · Jul 31

CUS-QA: Local-Knowledge-Oriented Open-Ended Question Answering Dataset arxiv.org/abs/2507.22752
by @jlibovicky.bsky.social , ‪@jindrahelcl.bsky.social, @andrei-a-manea.bsky.social
Question that foreigners don't know the answer to + human judgment on question generation

August 1, 2025 at 4:49 PM

Reposted by Jindřich Libovický

Jindra Helcl

@jindrahelcl.bsky.social

We need to have poster fights at the end of every conference.

July 29, 2025 at 7:01 PM

Jindřich Libovický

@jlibovicky.bsky.social

Just presented MAGBIG, a new dataset and evaluation methodology for gender bias in multilingual text-to-image generation. Grammatical gender matters when studying these biases across languages!
Thanks to Felix Friedrich, @kathaem.bsky.social and all co-authors - it was fun to work on this together!

Institute of Formal and Applied Linguistics @ufal.mff.cuni.cz · Jul 28

Multilingual Text-to-Image Generation Magnifies Gender Stereotypes
aclanthology.org/2025.acl-lon...
by Felix Friedrich, @kathaem.bsky.social, Patrick Schramowski, @mbrackaiml.bsky.social , @jlibovicky.bsky.social, @kerstingaiml.bsky.social, Alex Fraser

July 28, 2025 at 1:14 PM

Jindřich Libovický

@jlibovicky.bsky.social

This week I am at #ACL2025NLP in Vienna 🎡🇦🇹. Find me 🕵️ or message 💌 me if you want to chat about multilinguality or tokenization. Stop 🛑 by our poster on gender bias in text-to-image generation on Monday aclanthology.org/2025.acl-lon...

July 27, 2025 at 7:24 AM

Reposted by Jindřich Libovický

Tokenization Workshop (TokShop) @ICML2025

@tokshop.bsky.social

TokShop @ #ICML2025 got way more submissions than expected! 📈 We could really use a few more reviewers to help out. If you have the capacity to review a #tokenization paper by Saturday, please fill out this form: forms.gle/32A6sQHQrMSb... 🙏

TokShop 2025

Registering interest in all things tokenization at TokShop @ ICML 2025 (July 18) Consider joining the Google group for future updates! https://groups.google.com/g/tokshop

forms.gle

June 2, 2025 at 4:40 PM

Reposted by Jindřich Libovický

Tokenization Workshop (TokShop) @ICML2025

@tokshop.bsky.social

📣 Call for Paper Alert: TokShop @ ICML 2025
TokShop explores tokenization across all data modalities. Topics include: subword NLP techniques, multimodal approaches, multilingual challenges, post-training modification, alternative representations, and statistical perspectives.

ICML 2025 Workshop TokShop

Welcome to the OpenReview homepage for ICML 2025 Workshop TokShop

openreview.net

May 14, 2025 at 1:31 PM

Reposted by Jindřich Libovický

Tokenization Workshop (TokShop) @ICML2025

@tokshop.bsky.social

Got a tokenization paper that just didn't make the cut for ICML? Submit it to the Tokenization Workshop TokShop at #ICML2025 -- we'd love to see it there!
tokenization-workshop.github.io

Tokenization Workshop @ ICML 2025

tokenization-workshop.github.io

May 4, 2025 at 7:27 PM

Jindřich Libovický

@jlibovicky.bsky.social

Attending #NAACL2025 virtually. Since 2022, I've been training a classifier on papers I read to tackle the arXiv madness. Ran it on the NAACL proceedings for my personalized watch list. 🤓📺 However, it's far from perfect: Multilingual cultural awareness is great, but where is tokenization? 🤷

April 30, 2025 at 12:50 PM

Jindřich Libovický

@jlibovicky.bsky.social

We're organizing ✨Tokenization Workhop✨ TokShop❗ Join us at @icmlconf.bsky.social in July in Vancouver 🇨🇦. Follow @tokshop.bsky.social for updates! Submit your paper by May 30.

Tokenization Workshop (TokShop) @ICML2025 @tokshop.bsky.social · Apr 15

🚨 NEW WORKSHOP ALERT 🚨

We're thrilled to announce the first-ever Tokenization Workshop (TokShop) at #ICML2025 @icmlconf.bsky.social! 🎉

Submissions are open for work on tokenization across all areas of machine learning.

📅 Submission deadline: May 30, 2025
🔗 tokenization-workshop.github.io

Tokenization Workshop @ ICML 2025

tokenization-workshop.github.io

April 15, 2025 at 5:37 PM

Jindřich Libovický

@jlibovicky.bsky.social

Random take on the #TuringTest: Rather than testing machine intelligence, it can be a measure of societal awareness about #AI capabilities. The real objective isn't creating a machine that passes but educating people to think critically and avoid being deceived, so the machines do not pass the test.

April 4, 2025 at 7:37 PM

Jindřich Libovický

@jlibovicky.bsky.social

Summaries of pre-prints that I noticed and liked on arXiv in March are now on my blog jlibovicky.github.io//2025/04/02/...

Highlights from Machine Translation and Multilinguality in March 2025

EuroBERT: Scaling Multilingual Encoders for European Languages

jlibovicky.github.io

April 2, 2025 at 9:06 PM

Jindřich Libovický

@jlibovicky.bsky.social

Our paper 'Beyond Literal Token Overlap: Token Alignability for Multilinguality' will be at #NAACL2025! We show that token alignability is a stronger predictor of cross-lingual transfer than literal token overlap.

Read it here: arxiv.org/abs/2502.06468

March 10, 2025 at 3:48 PM

Jindřich Libovický

@jlibovicky.bsky.social

Short notes about what pre-prints I noticed in December and January are now on my blog: jlibovicky.github.io/2025/02/07/M...

Highlights from Machine Translation and Multilinguality in December 2024 and January 2025

MuLan: Adapting Multilingual Diffusion Models for Hundreds of Languages with Negligible Cost

jlibovicky.github.io

February 7, 2025 at 7:29 PM

Jindřich Libovický

@jlibovicky.bsky.social

Join Mu-SHROOM 🍄, a SemEval 2025 shared task on detecting hallucination spans in multilingual LLM outputs! 🌍 Includes Czech with regional Czech questions 🇨🇿. Do you think you can spot when something isn’t true? 🤔 Try it out! 👉 helsinki-nlp.github.io/shroom #SemEval2025 #NLP

Welcome to SemEval-2025 Task-3 — Mu-SHROOM, the Multilingual Shared-task on Hallucinations and Related Observable Overgeneration Mistakes

helsinki-nlp.github.io

January 14, 2025 at 3:56 PM

Jindřich Libovický

@jlibovicky.bsky.social

Happy holidays! 🎄🎅🤩🎁

December 24, 2024 at 1:36 PM

Jindřich Libovický

@jlibovicky.bsky.social

Highlights from multilingual #NLP and machine translation papers I found on arXiv in November are now on my blog: jlibovicky.github.io/2024/12/05/M...

Highlights from Machine Translation and Multilinguality in November 2024

Mitigating Metric Bias in Minimum Bayes Risk Decoding

jlibovicky.github.io

December 6, 2024 at 5:07 PM

Jindřich Libovický

@jlibovicky.bsky.social

This is going to be fun! 🤓 We have three years to spend 6.5M CZK on improving multilingual tokenization. The goal is to make subwords more alignable across languages and help languages that suffer from over-segmentation with current models.

Institute of Formal and Applied Linguistics @ufal.mff.cuni.cz · Dec 3

Good news! 🥳 GAČR will fund two of our projects:
👉 @jlibovicky.bsky.social proposes to better tokenization for #LLMs and machine translation
👉 Veronika Kolářová will study syntactic features of Czech non-verbal predicates
➕ Dominik Macháček receives Postdoc Individual Fellowship! 💪

December 3, 2024 at 5:53 PM

Jindřich Libovický

@jlibovicky.bsky.social

Just shared my takeaways from #EMNLP2024 on my blog: jlibovicky.github.io//2024/11/21/...

Notes from EMNLP 2024

Last week, I was at EMNLP in Miami, and here are a few notes about what I saw at the conference.

jlibovicky.github.io

November 21, 2024 at 11:36 AM

Reposted by Jindřich Libovický

Institute of Formal and Applied Linguistics

@ufal.mff.cuni.cz

Hello Blue Sky! 👋 This is the official account of the Institute of Formal and Applied Linguistics (ÚFAL for short) at the Faculty of Mathematics and Physics, Charles University in Prague 🇨🇿. Here, we will share news from the life of the institute and our members.

November 19, 2024 at 7:09 PM

Jindřich Libovický

@jlibovicky.bsky.social

Hello, Blue Sky 👋 🦋 Looking forward to reading and sometimes also posting about #NLP and related stuff! 🤓

November 19, 2024 at 5:35 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news