Ona de Gibert
onadegibert.bsky.social
Ona de Gibert
@onadegibert.bsky.social
PhD Student @HelsinkiNLP / Low-resource, Machine Translation, Knowledge Distillation, Multilinguality
See you next week at EMNLP!
We will be presenting our work: Scaling Low-Resource MT via Synthetic Data Generation with LLMs

📍 Poster Session 13
📅 Fri, Nov 7, 10:30-12:00 - Hall C
📖 Check it out! arxiv.org/abs/2505.14423

@helsinki-nlp.bsky.social @cambridgenlp.bsky.social @emnlpmeeting.bsky.social
Scaling Low-Resource MT via Synthetic Data Generation with LLMs
We investigate the potential of LLM-generated synthetic data for improving low-resource Machine Translation (MT). Focusing on seven diverse target languages, we construct a document-level synthetic co...
arxiv.org
October 28, 2025 at 8:16 AM
Last week I was at @aclmeeting.bsky.social ! Lots of friendly faces, great work and amazing art ✨️ We presented HPLT v2 datasets together with @very-laurie.bsky.social 🎉 Read our paper here: aclanthology.org/2025.acl-lon...
August 3, 2025 at 9:06 AM
NAACL was a blast 💥 Presented the findings of our Shared Tasks at @americasnlp.bsky.social, had a chance to reconnect with old friends, make new ones, and get excited about research I'm passionate about. #NAACL25
May 5, 2025 at 3:08 PM
Reposted by Ona de Gibert
I'm part of this! There's also a paper: arxiv.org/abs/2503.10267
** New parallel data set ** . We've just released HPLT v2.0, a parallel data set of 50 languages paired with English, 380M sentence pairs in total. Extracted from the Internet Archive and Common Crawl hplt-project.org/datasets/v2.0
HPLT - High Performance Language Technologies
A space that combines petabytes of natural language data with large-scale model training
hplt-project.org
March 17, 2025 at 1:27 PM
Come to Helsinki for the 18th MT Marathon! Sponsored by EAMT @ufal-cuni.bsky.social
March 18, 2025 at 1:10 PM
That's a wrap for @nodalida.bsky.social ! Short, nice and intense. I presented our work on efficient MT @helsinki-nlp.bsky.social within the #HPLT project⚡️
March 5, 2025 at 9:59 AM
Reposted by Ona de Gibert
** New parallel data set ** . We've just released HPLT v2.0, a parallel data set of 50 languages paired with English, 380M sentence pairs in total. Extracted from the Internet Archive and Common Crawl hplt-project.org/datasets/v2.0
HPLT - High Performance Language Technologies
A space that combines petabytes of natural language data with large-scale model training
hplt-project.org
February 28, 2025 at 1:34 PM
🚀 Just added our HPLT fast translation models to a new TranslateLocally repository! Translate on your own machine—fast, private, and easy. shorturl.at/R2vzw
20+ models for diverse languages—learn more about them next week at @nodalida.bsky.social!
February 24, 2025 at 9:04 AM
Reposted by Ona de Gibert
I am so excited to share with you all the 2025 edition of our #AmericasNLP workshop!
Do not hesitate in submitting your amazing research paper on indigenous and low resource languages.
Submission deadline: March 7, 2025
turing.iimas.unam.mx/americasnlp/...
January 30, 2025 at 3:17 PM
Reposted by Ona de Gibert
There is an open position for a postdoc in our lab in this call. The focus will be on scalable modular NLP with funding from the ERC PoC project MARMoT
February 7, 2025 at 2:12 PM