Lightnews — Scholar-powered news

Ilker Kesen

@ilkerkesen.bsky.social

Postdoctoral Scientist at University of Copenhagen. I am currently more focused on developing pixel language models. #nlproc #multilinguality #multimodality

Posts Replies Media Videos

Ilker Kesen

@ilkerkesen.bsky.social

This week at #EMNLP2025, I'll present our research on pretraining a multilingual pixel language model. Join the multilinguality session on Friday at 10:30 in Room A301 to learn more about pixel models and their benefits in multilingual settings. (Unfortunately I’ll be on Zoom)

November 3, 2025 at 5:39 PM

Ilker Kesen

@ilkerkesen.bsky.social

Furthermore, we assessed the informativeness of each task using Gini coefficients. We found that grammatical error correction, machine translation and extractive QA (about Turkish / Islam history) are the most informative tasks for evaluating LLMs in Turkish within 📏Cetvel.

September 5, 2025 at 1:40 PM

Ilker Kesen

@ilkerkesen.bsky.social

We tested widely used 33 open-weight LLMs covering different modely families up to 70B parameters. We find that (i) LLMs tailored for Turkish underperform compared against general-purpose LLMs, (ii) Llama 3 models dominate other LLMs within the same parameter scale, [...]

September 5, 2025 at 1:40 PM

Ilker Kesen

@ilkerkesen.bsky.social

Second, 📏Cetvel also offers NLP tasks linguistically and culturally grounded in Turkish, such as proverb understanding, circumflex-based word sense disambugiation, and extractive QA centered on Turkish and Islam history.

September 5, 2025 at 1:40 PM

Ilker Kesen

@ilkerkesen.bsky.social

First, 📏Cetvel goes beyond multiple-choice QA in contrast to existing Turkish benchmarks. It spans 23 tasks across 7 categories, including grammatical error correction, machine translation, summarization, and extractive QA.

September 5, 2025 at 1:40 PM

Ilker Kesen

@ilkerkesen.bsky.social

📢New preprint: We introduce 📏Cetvel, a unified benchmark for evaluating language understanding, generation, and cultural capacity of LLMs in Turkish🇹🇷 #AI #LLM #NLProc

Joint work with Abrek Er, @gozdegulsahin.bsky.social, @aykuterdem.bsky.social from KUIS AI Center.

September 5, 2025 at 1:40 PM

Ilker Kesen

@ilkerkesen.bsky.social

Data-efficiency analysis on the Indic NER benchmark also demonstrated that PIXEL-M4 excels at cross-lingual transfer learning in low-resource settings.

June 4, 2025 at 1:45 PM

Ilker Kesen

@ilkerkesen.bsky.social

Investigations on learned multilingual hidden representations reveal a strong semantic alignment between pretraining languages in the later layers, particularly for English-Ukrainian and English-Hindi pairs.

June 4, 2025 at 1:45 PM

Ilker Kesen

@ilkerkesen.bsky.social

Word-level probing analyses illustrate that PIXEL-M4 captures better linguistic features even on languages and writing systems not seen during pretraining.

June 4, 2025 at 1:45 PM

Ilker Kesen

@ilkerkesen.bsky.social

Downstream experiments on text classification, dependency parsing and named entity tasks recognition show that PIXEL-M4 outperforms its English-only-pretrained counterpart PIXEL-BIGRAMS on almost all non-Latin script languages.

June 4, 2025 at 1:45 PM

Ilker Kesen

@ilkerkesen.bsky.social

Announcing our recent work “Multilingual Pretraining for Pixel Language Models”! We introduce PIXEL-M4, a pixel language model pretrained on four visually & linguistically diverse scripts: English, Hindi, Ukrainian & Simplified Chinese. #NLProc

June 4, 2025 at 1:45 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news