Armel Randy Zebaze
armelrandy.bsky.social
Armel Randy Zebaze
@armelrandy.bsky.social
PhD Student @InriaParisNLP
TL;DR
Everything is in the title.

The paper is available on ArXiv
arxiv.org/pdf/2408.00397

The code and outputs are available on Github
github.com/ArmelRandy/I...

Thanks to my co-authors @bensagot.bsky.social and @rachelbawden.bsky.social, and to @inriaparisnlp.bsky.social.

10/10
arxiv.org
February 17, 2025 at 5:54 PM
Finally, we demonstrate that similarity-based example selection (in a high-quality sample pool) helps few-shot MT with LLMs ranging from 2 to 70 billion parameters. As the number of in-context examples grows, the gap with random selection remains significant.

9/10
February 17, 2025 at 5:54 PM
Using FLORES-200 dev set (997 human-written pairs) as our initial selection pool, we study the impact of reducing or expanding it with bitexts from the NLLB dataset. In Swahili, similarity search (notably SONAR) proves more robust to pool composition than random selection.

8/10
February 17, 2025 at 5:54 PM
SONAR also outperforms example selection based on string-matching metrics like BLEU, BM25, R(rerank)-BM25, and cosine-similarity with RoBERTa's sentence representations.

7/10
February 17, 2025 at 5:54 PM
Experiments with 5 sentence embeddings on 4 FLORES-200 languages show that similarity-based selection outperforms random selection in LRLs but offers only marginal gains in HRLs (French). Across both cases, sentence embeddings perform similarly, with SONAR slightly leading.

6/10
February 17, 2025 at 5:54 PM
We tackle these issues by assigning a zero score to problematic generations, making the metrics language-aware. Specifically, we evaluate with Language-aware COMET, based on COMET-22. It preserves COMET's accuracy while improving the assessment of problematic outputs.

5/10
February 17, 2025 at 5:54 PM
Translating into low-resource languages presents two main challenges:
• Outputs may be in the wrong language (e.g., repeating the prompt).
• They may be empty or contain meaningless repetitions.
Current neural metrics are not robust to these issues.

4/10
February 17, 2025 at 5:54 PM
We examine three aspects:
• Evaluating LLM-based MT into LRLs.
• Assessing whether similarity-based example selection improves MT, especially with a small pool (typical) for LRLs, and at scale.
• Testing the strategy’s robustness to selection pool heterogeneity.

3/10
February 17, 2025 at 5:54 PM
We explore in-context example selection for MT, focusing on LRLs (Swahili, Wolof etc. ). Given a sentence and a selection pool, we choose the k closest pairs based on a sentence embedding or a string-matching metric, placing the most similar closest to the sentence.

2/10
February 17, 2025 at 5:54 PM