Lightnews — Scholar-powered news

Raphaël Merx

@rapha.dev

This is some legit really impressive work!!

Shayne Longpre @shaynelongpre.bsky.social · Oct 28

📢Thrilled to introduce ATLAS 🗺️: the largest multilingual scaling study to-date—we ran 774 exps (10M-8B params, 400+ languages) to answer:

🌍 Is scaling diff by lang?

🧙‍♂️ Can we model the curse of multilinguality?

⚖️ Pretrain vs finetune from checkpoint?

🔀 X-lingual transfer scores across langs?

1/🧵

October 30, 2025 at 12:24 PM

Reposted by Raphaël Merx

Multilingual Representation Workshop @ EMNLP 2025

@mrl-workshop.bsky.social

Introducing Global PIQA, a new multilingual benchmark for 100+ languages. This benchmark is the outcome of this year’s MRL shared task, in collaboration with 300+ researchers from 65 countries. This dataset evaluates physical commonsense reasoning in culturally relevant contexts.

October 29, 2025 at 3:50 PM

Raphaël Merx

@rapha.dev

Whoa the #WMT25 results on MT Evaluation are wild! ChrF outperforms pretty much all neural metrics 🙀

October 18, 2025 at 5:17 AM

Raphaël Merx

@rapha.dev

in Vienna for ACL, presenting Tulun, a system for low-resource in-domain translation, using LLMs
Tuesday @ 4pm

Working w 2 real use cases: medical translation into Tetun 🇹🇱 & disaster relief speech translation in Bislama 🇻🇺

July 27, 2025 at 4:00 PM

Raphaël Merx

@rapha.dev

Cool paper, at the intersection of grammar and LLM interpretability.

I like that they use linguistic datasets for their experiments, then get results that can contribute to linguistics as a field too! (on structural priming vs L1/L2)

Catherine Arnett @ NeurIPS (San Diego) @catherinearnett.bsky.social · Jun 5

My paper with @tylerachang.bsky.social and @jamichaelov.bsky.social will appear at #ACL2025NLP! The updated preprint is available on arxiv. I look forward to chatting about bilingual models in Vienna!

Catherine Arnett @ NeurIPS (San Diego) @catherinearnett.bsky.social · Mar 7

✨New pre-print✨ Crosslingual transfer allows models to leverage their representations for one language to improve performance on another language. We characterize the acquisition of shared representations in order to better understand how and when crosslingual transfer happens.

June 8, 2025 at 4:47 AM

Raphaël Merx

@rapha.dev

Our paper on who uses tetun.org, and what for, got published at the LoResMT 2025 workshop! An emotional paper for me, going back to the project that got me into a machine learning PhD in the first place.

May 25, 2025 at 1:11 AM

Raphaël Merx

@rapha.dev

My favourite ICLR paper so far. Methodology, findings and their implications are all very cool.

In particular Fig. 2 + this discussion point:

May 8, 2025 at 10:20 AM

Raphaël Merx

@rapha.dev

Incredible paper, finding that large companies can game the LMArena through statistical noise (via many model submissions), over-sampling of their models, and overfitting to Arena-style prompts (without real gains on model reasoning)

The experiments they run to show this are pretty cool too!

Sara Hooker @sarahooker.bsky.social · Apr 30

It is critical for scientific integrity that we trust our measure of progress.

The @lmarena.bsky.social has become the go-to evaluation for AI progress.

Our release today demonstrates the difficulty in maintaining fair evaluations on the Arena, despite best intentions.

May 2, 2025 at 1:28 PM

Raphaël Merx

@rapha.dev

Cool summary of issues with multilingual LLM eval, and potential solutions!

If you're doubtful of all these non-reproducible evals on translated multiple choice questions, this paper is for you

Julia Kreutzer @juliakreutzer.bsky.social · Apr 17

📖New preprint with Eleftheria Briakou @swetaagrawal.bsky.social @mziizm.bsky.social @kocmitom.bsky.social!

arxiv.org/abs/2504.11829

🌍It reflects experiences from my personal research journey: coming from MT into multilingual LLM research I missed reliable evaluations and evaluation research…

Screenshot of the paper header with title and author list and affiliations

April 23, 2025 at 9:50 AM

Raphaël Merx

@rapha.dev

GlotEval - a unified framework for multilingual eval of LLMs, on 7 different tasks, by @tiedeman.bsky.social @helsinki-nlp.bsky.social

Just wish it supported eval of closed models (e.g. through LiteLLM?)

github.com/MaLA-LM/Glot...

GitHub - MaLA-LM/GlotEval: GlotEval: a unified evaluation toolkit designed to benchmark Large Language Models (LLMs) in a language-specific way

GlotEval: a unified evaluation toolkit designed to benchmark Large Language Models (LLMs) in a language-specific way - MaLA-LM/GlotEval

github.com

April 11, 2025 at 7:41 AM

Reposted by Raphaël Merx

PyCon AU

@pyconau.bsky.social

👋 Hey Bluesky!

We’ve just touched down and we’re excited to be here 🌤️🐍

This is the official PyCon AU account, your go-to space for updates, announcements, and all things Python in Australia✨

Hit that follow button and stay tuned because we’ve got some awesome things coming your way!

#PyConAU

March 30, 2025 at 10:30 PM

Reposted by Raphaël Merx

iseeaswell.bsky.social

@iseeaswell.bsky.social

😼SMOL DATA ALERT! 😼Anouncing SMOL, a professionally-translated dataset for 115 very low-resource languages! Paper: arxiv.org/pdf/2502.12301
Huggingface: huggingface.co/datasets/goo...

February 19, 2025 at 5:36 PM

Reposted by Raphaël Merx

Kris 🔮 Lorischild

@direkris.itch.io

Been hearing a lot about recency bias lately. Must be pretty important

January 15, 2025 at 3:10 AM

Raphaël Merx

@rapha.dev

Our paper on generating bilingual example sentences with LLMs got best paper award @ ALTA in Canberra!

arxiv.org/abs/2410.03182

We work with French / Indonesian / Tetun, find that annotators don't agree about what's a "good example", but that LLMs can align with a specific annotator.

December 5, 2024 at 3:12 AM

Raphaël Merx

@rapha.dev

Another example of why we need evals that take clinical risk into account when training NLP models for health
slator.com/openais-whis...

OpenAI’s Whisper Faces Bad Press for Hallucinations in Healthcare Transcription

The Associated Press, WIRED, Fortune, and other major media sources report hallucinations in healthcare by the OpenAI transcription tool Whisper.

slator.com

November 22, 2024 at 1:27 PM

Reposted by Raphaël Merx

Irene Chen

@irenetrampoline.bsky.social

What do it mean to be a “low resourced” language? I’ve seen definitions for less training data to low number of speakers. Great to see this important clarifying work at #EMNLP2024 from @hellinanigatu.bsky.social et al

aclanthology.org/2024.emnlp-m...

November 15, 2024 at 9:43 PM

Raphaël Merx

@rapha.dev

this guy lives rent free in my hippocampus

November 20, 2024 at 7:01 AM

Raphaël Merx

@rapha.dev

Is productionisation (and move to gimmicks like CoT in o1-preview) at OpenAI and Anthropic a sign that scaling laws are slowing? And if so, where are we headed in LLMs?

Slightly pretentious but enjoyable read: www.generalist.com/briefing/the...

The Bitter Religion: AI’s Holy War Over Scaling Laws | The Generalist

The AI community is locked in a doctrinal battle about its future and whether sufficient scale will create God.

www.generalist.com

November 20, 2024 at 6:55 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news