Ilker Kesen
ilkerkesen.bsky.social
Ilker Kesen
@ilkerkesen.bsky.social
Postdoctoral Scientist at University of Copenhagen. I am currently more focused on developing pixel language models. #nlproc #multilinguality #multimodality
We used to develop Knet, but after the rise of HuggingFace, we stopped using it. Before HF, it was also painful to convert each model to Julia code and array, where the models were already available in PyTorch/Tensorflow. Though, I guess you can find BERT/GPT implementations in Knet, somewhere.
October 8, 2025 at 12:37 PM
For more details about 📏Cetvel, please check our preprint.

📜Paper: arxiv.org/abs/2508.16431
💻Code: github.com/KUIS-AI/cetvel
📊Leaderboard: huggingface.co/spaces/KUIS-...
September 5, 2025 at 1:40 PM
Furthermore, we assessed the informativeness of each task using Gini coefficients. We found that grammatical error correction, machine translation and extractive QA (about Turkish / Islam history) are the most informative tasks for evaluating LLMs in Turkish within 📏Cetvel.
September 5, 2025 at 1:40 PM
and lastly (iii) Turkish-centric 8B parameter model Cere-Llama-3-8B outperforms even 70B parameter model Llama-3.3-70B on some Turkish-centric tasks such as grammatical error correction.
September 5, 2025 at 1:40 PM
We tested widely used 33 open-weight LLMs covering different modely families up to 70B parameters. We find that (i) LLMs tailored for Turkish underperform compared against general-purpose LLMs, (ii) Llama 3 models dominate other LLMs within the same parameter scale, [...]
September 5, 2025 at 1:40 PM
Second, 📏Cetvel also offers NLP tasks linguistically and culturally grounded in Turkish, such as proverb understanding, circumflex-based word sense disambugiation, and extractive QA centered on Turkish and Islam history.
September 5, 2025 at 1:40 PM
First, 📏Cetvel goes beyond multiple-choice QA in contrast to existing Turkish benchmarks. It spans 23 tasks across 7 categories, including grammatical error correction, machine translation, summarization, and extractive QA.
September 5, 2025 at 1:40 PM
So, why another Turkish benchmark? The answer is that existing benchmarks often fall short either in limited task diversity or lack of content culturally relevant to Turkish. Unlike existing benchmarks, 📏Cetvel addresses both shortcomings adequately.
September 5, 2025 at 1:40 PM
For more details about PIXEL-M4, please check our preprint.

Paper: arxiv.org/abs/2505.21265
Model: huggingface.co/Team-PIXEL/p...
Code: github.com/ilkerkesen/p...

In collaboration with Jonas F. Lotz, Ingo Ziegler, Phillip Rust and Desmond Elliott @delliott.bsky.social
Multilingual Pretraining for Pixel Language Models
Pixel language models operate directly on images of rendered text, eliminating the need for a fixed vocabulary. While these models have demonstrated strong capabilities for downstream cross-lingual tr...
arxiv.org
June 4, 2025 at 1:45 PM
Data-efficiency analysis on the Indic NER benchmark also demonstrated that PIXEL-M4 excels at cross-lingual transfer learning in low-resource settings.
June 4, 2025 at 1:45 PM
Investigations on learned multilingual hidden representations reveal a strong semantic alignment between pretraining languages in the later layers, particularly for English-Ukrainian and English-Hindi pairs.
June 4, 2025 at 1:45 PM
Word-level probing analyses illustrate that PIXEL-M4 captures better linguistic features even on languages and writing systems not seen during pretraining.
June 4, 2025 at 1:45 PM
Downstream experiments on text classification, dependency parsing and named entity tasks recognition show that PIXEL-M4 outperforms its English-only-pretrained counterpart PIXEL-BIGRAMS on almost all non-Latin script languages.
June 4, 2025 at 1:45 PM
The LLM feedbacks that I received were like paraphrased versions of my reviews, just trying to convince me about being *more explicit* even though (I think) I was already sufficiently explicit. I did not change anything and the authors were able to understand the points that I raised.
November 25, 2024 at 5:22 PM