Raphaël Merx
rapha.dev
Raphaël Merx
@rapha.dev
PhD @ UniMelb
NLP, with a healthy dose of MT

Based in 🇮🇩, worked in 🇹🇱 🇵🇬 , from 🇫🇷
This is some legit really impressive work!!
📢Thrilled to introduce ATLAS 🗺️: the largest multilingual scaling study to-date—we ran 774 exps (10M-8B params, 400+ languages) to answer:

🌍 Is scaling diff by lang?

🧙‍♂️ Can we model the curse of multilinguality?

⚖️ Pretrain vs finetune from checkpoint?

🔀 X-lingual transfer scores across langs?

1/🧵
October 30, 2025 at 12:24 PM
Reposted by Raphaël Merx
Introducing Global PIQA, a new multilingual benchmark for 100+ languages. This benchmark is the outcome of this year’s MRL shared task, in collaboration with 300+ researchers from 65 countries. This dataset evaluates physical commonsense reasoning in culturally relevant contexts.
October 29, 2025 at 3:50 PM
Whoa the #WMT25 results on MT Evaluation are wild! ChrF outperforms pretty much all neural metrics 🙀
October 18, 2025 at 5:17 AM
in Vienna for ACL, presenting Tulun, a system for low-resource in-domain translation, using LLMs
Tuesday @ 4pm

Working w 2 real use cases: medical translation into Tetun 🇹🇱 & disaster relief speech translation in Bislama 🇻🇺
July 27, 2025 at 4:00 PM
Cool paper, at the intersection of grammar and LLM interpretability.

I like that they use linguistic datasets for their experiments, then get results that can contribute to linguistics as a field too! (on structural priming vs L1/L2)
My paper with @tylerachang.bsky.social and @jamichaelov.bsky.social will appear at #ACL2025NLP! The updated preprint is available on arxiv. I look forward to chatting about bilingual models in Vienna!
✨New pre-print✨ Crosslingual transfer allows models to leverage their representations for one language to improve performance on another language. We characterize the acquisition of shared representations in order to better understand how and when crosslingual transfer happens.
June 8, 2025 at 4:47 AM
Our paper on who uses tetun.org, and what for, got published at the LoResMT 2025 workshop! An emotional paper for me, going back to the project that got me into a machine learning PhD in the first place.
May 25, 2025 at 1:11 AM
My favourite ICLR paper so far. Methodology, findings and their implications are all very cool.

In particular Fig. 2 + this discussion point:
May 8, 2025 at 10:20 AM
Incredible paper, finding that large companies can game the LMArena through statistical noise (via many model submissions), over-sampling of their models, and overfitting to Arena-style prompts (without real gains on model reasoning)

The experiments they run to show this are pretty cool too!
It is critical for scientific integrity that we trust our measure of progress.

The @lmarena.bsky.social has become the go-to evaluation for AI progress.

Our release today demonstrates the difficulty in maintaining fair evaluations on the Arena, despite best intentions.
May 2, 2025 at 1:28 PM
Cool summary of issues with multilingual LLM eval, and potential solutions!

If you're doubtful of all these non-reproducible evals on translated multiple choice questions, this paper is for you
📖New preprint with Eleftheria Briakou @swetaagrawal.bsky.social @mziizm.bsky.social @kocmitom.bsky.social!

arxiv.org/abs/2504.11829

🌍It reflects experiences from my personal research journey: coming from MT into multilingual LLM research I missed reliable evaluations and evaluation research…
April 23, 2025 at 9:50 AM
GlotEval - a unified framework for multilingual eval of LLMs, on 7 different tasks, by @tiedeman.bsky.social @helsinki-nlp.bsky.social

Just wish it supported eval of closed models (e.g. through LiteLLM?)

github.com/MaLA-LM/Glot...
GitHub - MaLA-LM/GlotEval: GlotEval: a unified evaluation toolkit designed to benchmark Large Language Models (LLMs) in a language-specific way
GlotEval: a unified evaluation toolkit designed to benchmark Large Language Models (LLMs) in a language-specific way - MaLA-LM/GlotEval
github.com
April 11, 2025 at 7:41 AM
Reposted by Raphaël Merx
👋 Hey Bluesky!

We’ve just touched down and we’re excited to be here 🌤️🐍

This is the official PyCon AU account, your go-to space for updates, announcements, and all things Python in Australia✨

Hit that follow button and stay tuned because we’ve got some awesome things coming your way!

#PyConAU
March 30, 2025 at 10:30 PM
Reposted by Raphaël Merx
😼SMOL DATA ALERT! 😼Anouncing SMOL, a professionally-translated dataset for 115 very low-resource languages! Paper: arxiv.org/pdf/2502.12301
Huggingface: huggingface.co/datasets/goo...
February 19, 2025 at 5:36 PM
Reposted by Raphaël Merx
Been hearing a lot about recency bias lately. Must be pretty important
January 15, 2025 at 3:10 AM
Our paper on generating bilingual example sentences with LLMs got best paper award @ ALTA in Canberra!

arxiv.org/abs/2410.03182

We work with French / Indonesian / Tetun, find that annotators don't agree about what's a "good example", but that LLMs can align with a specific annotator.
December 5, 2024 at 3:12 AM
Another example of why we need evals that take clinical risk into account when training NLP models for health
slator.com/openais-whis...
OpenAI’s Whisper Faces Bad Press for Hallucinations in Healthcare Transcription
The Associated Press, WIRED, Fortune, and other major media sources report hallucinations in healthcare by the OpenAI transcription tool Whisper.
slator.com
November 22, 2024 at 1:27 PM
Reposted by Raphaël Merx
What do it mean to be a “low resourced” language? I’ve seen definitions for less training data to low number of speakers. Great to see this important clarifying work at #EMNLP2024 from @hellinanigatu.bsky.social et al

aclanthology.org/2024.emnlp-m...
November 15, 2024 at 9:43 PM
this guy lives rent free in my hippocampus
November 20, 2024 at 7:01 AM
Is productionisation (and move to gimmicks like CoT in o1-preview) at OpenAI and Anthropic a sign that scaling laws are slowing? And if so, where are we headed in LLMs?

Slightly pretentious but enjoyable read: www.generalist.com/briefing/the...
The Bitter Religion: AI’s Holy War Over Scaling Laws | The Generalist
The AI community is locked in a doctrinal battle about its future and whether sufficient scale will create God.
www.generalist.com
November 20, 2024 at 6:55 AM