Ilker Kesen
ilkerkesen.bsky.social
Ilker Kesen
@ilkerkesen.bsky.social
Postdoctoral Scientist at University of Copenhagen. I am currently more focused on developing pixel language models. #nlproc #multilinguality #multimodality
This week at #EMNLP2025, I'll present our research on pretraining a multilingual pixel language model. Join the multilinguality session on Friday at 10:30 in Room A301 to learn more about pixel models and their benefits in multilingual settings. (Unfortunately I’ll be on Zoom)
November 3, 2025 at 5:39 PM
Furthermore, we assessed the informativeness of each task using Gini coefficients. We found that grammatical error correction, machine translation and extractive QA (about Turkish / Islam history) are the most informative tasks for evaluating LLMs in Turkish within 📏Cetvel.
September 5, 2025 at 1:40 PM
We tested widely used 33 open-weight LLMs covering different modely families up to 70B parameters. We find that (i) LLMs tailored for Turkish underperform compared against general-purpose LLMs, (ii) Llama 3 models dominate other LLMs within the same parameter scale, [...]
September 5, 2025 at 1:40 PM
Second, 📏Cetvel also offers NLP tasks linguistically and culturally grounded in Turkish, such as proverb understanding, circumflex-based word sense disambugiation, and extractive QA centered on Turkish and Islam history.
September 5, 2025 at 1:40 PM
First, 📏Cetvel goes beyond multiple-choice QA in contrast to existing Turkish benchmarks. It spans 23 tasks across 7 categories, including grammatical error correction, machine translation, summarization, and extractive QA.
September 5, 2025 at 1:40 PM
📢New preprint: We introduce 📏Cetvel, a unified benchmark for evaluating language understanding, generation, and cultural capacity of LLMs in Turkish🇹🇷 #AI #LLM #NLProc

Joint work with Abrek Er, @gozdegulsahin.bsky.social, @aykuterdem.bsky.social from KUIS AI Center.
September 5, 2025 at 1:40 PM
Data-efficiency analysis on the Indic NER benchmark also demonstrated that PIXEL-M4 excels at cross-lingual transfer learning in low-resource settings.
June 4, 2025 at 1:45 PM
Investigations on learned multilingual hidden representations reveal a strong semantic alignment between pretraining languages in the later layers, particularly for English-Ukrainian and English-Hindi pairs.
June 4, 2025 at 1:45 PM
Word-level probing analyses illustrate that PIXEL-M4 captures better linguistic features even on languages and writing systems not seen during pretraining.
June 4, 2025 at 1:45 PM
Downstream experiments on text classification, dependency parsing and named entity tasks recognition show that PIXEL-M4 outperforms its English-only-pretrained counterpart PIXEL-BIGRAMS on almost all non-Latin script languages.
June 4, 2025 at 1:45 PM
Announcing our recent work “Multilingual Pretraining for Pixel Language Models”! We introduce PIXEL-M4, a pixel language model pretrained on four visually & linguistically diverse scripts: English, Hindi, Ukrainian & Simplified Chinese. #NLProc
June 4, 2025 at 1:45 PM