Lightnews — Scholar-powered news

Armel Randy Zebaze

@armelrandy.bsky.social

16 followers 36 following 11 posts

PhD Student @InriaParisNLP

Posts Replies Media Videos

Armel Randy Zebaze

@armelrandy.bsky.social

🎉 Happy to share that 2 of our papers were accepted to #EMNLP2025 Findings! 🚀
[1] Compositional Translation: A Novel LLM-based Approach for Low-resource Machine Translation
[2] TopXGen: Topic-Diverse Parallel Data Generation for Low-Resource Machine Translation

Thank you to my amazing co-authors! 🙌

August 21, 2025 at 4:26 PM

Armel Randy Zebaze

@armelrandy.bsky.social

Finally, we demonstrate that similarity-based example selection (in a high-quality sample pool) helps few-shot MT with LLMs ranging from 2 to 70 billion parameters. As the number of in-context examples grows, the gap with random selection remains significant.

9/10

February 17, 2025 at 5:54 PM

Armel Randy Zebaze

@armelrandy.bsky.social

Using FLORES-200 dev set (997 human-written pairs) as our initial selection pool, we study the impact of reducing or expanding it with bitexts from the NLLB dataset. In Swahili, similarity search (notably SONAR) proves more robust to pool composition than random selection.

8/10

February 17, 2025 at 5:54 PM

Armel Randy Zebaze

@armelrandy.bsky.social

SONAR also outperforms example selection based on string-matching metrics like BLEU, BM25, R(rerank)-BM25, and cosine-similarity with RoBERTa's sentence representations.

7/10

February 17, 2025 at 5:54 PM

Armel Randy Zebaze

@armelrandy.bsky.social

Experiments with 5 sentence embeddings on 4 FLORES-200 languages show that similarity-based selection outperforms random selection in LRLs but offers only marginal gains in HRLs (French). Across both cases, sentence embeddings perform similarly, with SONAR slightly leading.

6/10

February 17, 2025 at 5:54 PM

Armel Randy Zebaze

@armelrandy.bsky.social

Translating into low-resource languages presents two main challenges:
• Outputs may be in the wrong language (e.g., repeating the prompt).
• They may be empty or contain meaningless repetitions.
Current neural metrics are not robust to these issues.

4/10

February 17, 2025 at 5:54 PM

Armel Randy Zebaze

@armelrandy.bsky.social

We explore in-context example selection for MT, focusing on LRLs (Swahili, Wolof etc. ). Given a sentence and a selection pool, we choose the k closest pairs based on a sentence embedding or a string-matching metric, placing the most similar closest to the sentence.

2/10

February 17, 2025 at 5:54 PM

Armel Randy Zebaze

@armelrandy.bsky.social

I am happy to announce that our paper "In-context Example Selection via Similarity Search Improves Low-resource Machine Translation" was accepted to the #NAACL2025 Findings 🤩🔥.

What is this about?

TAGS: Machine Translation (MT), High/Low -resource languages (H/LRLs).
🧵

1/10

February 17, 2025 at 5:54 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news