But don't draw conclusions just yet - automatic metrics are biased for techniques like metric as a reward model or MBR. The official human ranking will be part of General MT findings at WMT.
arxiv.org/abs/2508.14909
But don't draw conclusions just yet - automatic metrics are biased for techniques like metric as a reward model or MBR. The official human ranking will be part of General MT findings at WMT.
arxiv.org/abs/2508.14909
The Eval Leaderboard is now LIVE! 🏆💻
Our video retrieval collection stumps most pre-trained models. See if you can build a better system!
eval.ai/web/challeng...
"Faux Polyglots: A study on Information Disparity in Multilingual Large Language Models".
Come visit and learn about how multilingual RALMs fail to handle multilingual information conflicts.
Teaser: youtu.be/aPS2Ntav1FE
#LLM #AI #NLProc
Unfortunately, No.
We find LLMs are faux polyglots.
📢Preprint: tinyurl.com/fdunz3dz
#LLMs #NLProc
"Faux Polyglots: A study on Information Disparity in Multilingual Large Language Models".
Come visit and learn about how multilingual RALMs fail to handle multilingual information conflicts.
Teaser: youtu.be/aPS2Ntav1FE
#LLM #AI #NLProc
📢 Check out DialUp, a technique to make your MT model robust to the dialect continua of its training languages, including unseen dialects.
arxiv.org/abs/2501.16581
📢 Check out DialUp, a technique to make your MT model robust to the dialect continua of its training languages, including unseen dialects.
arxiv.org/abs/2501.16581
This channel will serve as the primary communication method between authors, participants, and organizers: join.slack.com/t/magmarshar...
This channel will serve as the primary communication method between authors, participants, and organizers: join.slack.com/t/magmarshar...