changemily.bsky.social
@changemily.bsky.social
There is a strong linear correlation between MT and Word Translation. MT datasets are expensive to come by - in their absence, ChiKhaPo can provide a cheap proxy for MT performance.
November 24, 2025 at 11:46 PM
Here’s a plot of language resource level against the model’s task performance. It’s logarithmic: the long tail of languages does very badly, and performance improves quickly for mid-resource languages.
November 24, 2025 at 11:45 PM
When we group SOTA model results by language family, the performance gap between Indo-European languages and underrepresented Austronesian and Atlantic-Congo languages becomes evident.
November 24, 2025 at 11:44 PM
Results on 6 SOTA models show that there remains significant room for improvement across all 8 subtasks: ChiKhaPo is a challenging measure of multilingual performance at the lexical level.
November 24, 2025 at 11:44 PM
ChiKhaPo draws from numerous publicly available resources and can be easily extended to even more languages as these resources expand:

📗 translation lexicons (PANLEX, IDS, GATITOS),
📃 monolingual text (GLOTLID), and
📖 bitext (FLORES+)
November 24, 2025 at 11:43 PM
Models in ChiKhaPo are evaluated on their ability to translate words to English (comprehension X→model) and from English (generation model→X), in 4 settings and 2 directions. We illustrate all 8 subtasks below.
November 24, 2025 at 11:42 PM
Frustrated with how most of the world’s low-resource languages have NO evaluation resources?

📢 Check out ChiKhaPo, a massively multilingual lexical comprehension and generation benchmark covering 2700+ languages.
www.arxiv.org/abs/2510.16928
November 24, 2025 at 11:41 PM