Lightnews — Scholar-powered news

Raphaël Merx

@rapha.dev

52 followers 90 following 30 posts

PhD @ UniMelb
NLP, with a healthy dose of MT

Based in 🇮🇩, worked in 🇹🇱 🇵🇬 , from 🇫🇷

Posts Replies Media Videos

Raphaël Merx

@rapha.dev

the paper www2.statmt.org/wmt25/pdf/20...

www2.statmt.org

October 18, 2025 at 5:17 AM

Raphaël Merx

@rapha.dev

They say it's because (1) test sets have become more challenging, (2) include more lang pairs, (3) are longer, and (4) used ESA instead of MQM. But we need an ablation study!

October 18, 2025 at 5:17 AM

Raphaël Merx

@rapha.dev

kudos to whoever came up with that paper name 👌

October 6, 2025 at 8:44 AM

Raphaël Merx

@rapha.dev

paper: aclanthology.org/2025.acl-dem...
demo: youtu.be/fQFwOxzR4MI

Tulun: Transparent and Adaptable Low-resource Machine Translation

Raphael Merx, Hanna Suominen, Lois Yinghui Hong, Nick Thieberger, Trevor Cohn, Ekaterina Vylomova. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 3: Sy...

aclanthology.org

July 27, 2025 at 4:00 PM

Raphaël Merx

@rapha.dev

Thanks a lot! I didn't make it to Albuquerque unfortunately, but I hope to be in Vienna for ACL. Might see you there?

May 26, 2025 at 2:25 AM

Raphaël Merx

@rapha.dev

Many thanks to Adérito Correia (Timor-Leste INL), and my supervisors Hanna Suominen Katerina Vylomova!

Paper at aclanthology.org/2025.loresmt... , video presentation at youtu.be/8zenieJWRyg

Low-resource Machine Translation: what for? who for? An observational study on a dedicated Tetun language translation service

Raphael Merx, Adérito José Guterres Correia, Hanna Suominen, Ekaterina Vylomova. Proceedings of the Eighth Workshop on Technologies for Machine Translation of Low-Resource Languages (LoResMT 2025). 20...

aclanthology.org

May 25, 2025 at 1:11 AM

Raphaël Merx

@rapha.dev

(3) The vast majority of usage is on mobile (over 90% of users / over 80k devices)

Takeaway: publishing MT model in mobile apps is probably more impactful than setting up a website / HuggingFace space.

May 25, 2025 at 1:11 AM

Raphaël Merx

@rapha.dev

(2) Translation into Tetun is in higher demand (by >2x) than translation from Tetun

Takeaway for us MT folks: focus on translation into low-res langs, harder but more impactful

May 25, 2025 at 1:11 AM

Raphaël Merx

@rapha.dev

We find that
(1) a LOT of usage is for educational purposes (>50% of translated text)
--> contrasts sharply with Tetun corpora (e.g. MADLAD), dominated by news & religion.

Takeaway: don't evaluate MT on overrepresented domains (e.g. religion)! You risk misrepresenting end-user exp.

May 25, 2025 at 1:11 AM

Raphaël Merx

@rapha.dev

Very interesting findings, particularly the benefit (or lack thereof) of test-time scaling across domains

May 13, 2025 at 12:40 AM

Raphaël Merx

@rapha.dev

AI dev tools. In particular agents: are they hype or useful or both?

March 31, 2025 at 3:20 AM

Raphaël Merx

@rapha.dev

Perceptricon

March 26, 2025 at 8:29 AM

Raphaël Merx

@rapha.dev

The right thing to do, thanks for this *SEM

March 17, 2025 at 8:19 AM

Raphaël Merx

@rapha.dev

Super impactful, thank you for this! A natural sequel of Gatitos.

I'm esp. fond of your "researcher in the loop" method to ensure wide vocab coverage.

February 20, 2025 at 10:23 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news