Kirill Semenov
kiryukhasemenov.bsky.social
Kirill Semenov
@kiryukhasemenov.bsky.social
PhD student at the University of Zurich. Trying to get to know what LLMs know🤔
Let's meet at #EMNLP and talk about multilingual knowledge benchmarks!

⚠️MLAMA is full of disfluent sentences
❓Reason: templated translation
💡Simple full-sentence translation improves factual retrieval up to 25%
🙌Remember to check your benchmarks with speakers!

Link: arxiv.org/pdf/2510.15115
October 28, 2025 at 9:09 PM
🎉 Terminology Shared Task @WMT25: Paper Out 🎉
Highlights:
- sentence translation seems solvable, document translation is still challenging
- better systems benefit more from proper terminologies
- term-based metrics correlate poorly with general translation quality

www2.statmt.org/wmt25/pdf/20...
www2.statmt.org
October 24, 2025 at 7:57 PM
Our paper at TokShop

InCa and InDia: more stable and interpretable tokenizer preprocessing that handles casing and diacritization!

Check out our:
💻package: github.com/Kiryukhaseme...
🎥video: www.youtube.com/watch?v=XgDP...
📝paper: openreview.net/pdf?id=9GwVW...
GitHub - Kiryukhasemenov/InFlags: Python package for dictionary-based inline tokenization preprocessing
Python package for dictionary-based inline tokenization preprocessing - Kiryukhasemenov/InFlags
github.com
July 25, 2025 at 12:54 PM
📣Take part in 3rd Terminology shared task @WMT!📣
This year:
👉5 language pairs: EN->{ES, RU, DE, ZH},
👉2 tracks - sentence-level and doc-level translation,
👉authentic data from 2 domains: finance and IT!

www2.statmt.org/wmt25/termin...

Don't miss an opportunity - we only do it once in two years😏
Terminology Translation Task
www2.statmt.org
June 6, 2025 at 3:54 PM