Head of Helsinki-NLP @helsinki-nlp.bsky.social
Member of the Ellis unit Helsinki @ellisfinland.bsky.social
We will be presenting our work: Scaling Low-Resource MT via Synthetic Data Generation with LLMs
📍 Poster Session 13
📅 Fri, Nov 7, 10:30-12:00 - Hall C
📖 Check it out! arxiv.org/abs/2505.14423
@helsinki-nlp.bsky.social @cambridgenlp.bsky.social @emnlpmeeting.bsky.social
We will be presenting our work: Scaling Low-Resource MT via Synthetic Data Generation with LLMs
📍 Poster Session 13
📅 Fri, Nov 7, 10:30-12:00 - Hall C
📖 Check it out! arxiv.org/abs/2505.14423
@helsinki-nlp.bsky.social @cambridgenlp.bsky.social @emnlpmeeting.bsky.social
- Paper submission: Dec 19
- Commitment for pre-reviewed papers: Jan 2
- Acceptance notifs: Jan 23
- Camera-ready: Feb 3
- Workshop: TBD (Mar 24-29)
Organizers:
Yves Scherrer, Noëmi Aepli, @tosaja.bsky.social, Nikola Ljubešić, Preslav Nakov, @tiedeman.bsky.social, Marcos Zampieri & me
- Paper submission: Dec 19
- Commitment for pre-reviewed papers: Jan 2
- Acceptance notifs: Jan 23
- Camera-ready: Feb 3
- Workshop: TBD (Mar 24-29)
Organizers:
Yves Scherrer, Noëmi Aepli, @tosaja.bsky.social, Nikola Ljubešić, Preslav Nakov, @tiedeman.bsky.social, Marcos Zampieri & me
Big thanks to my supervisor @tiedeman.bsky.social man.bsky.social, who will be presenting our poster — come say hi!
Big thanks to my supervisor @tiedeman.bsky.social man.bsky.social, who will be presenting our poster — come say hi!
⚙️Trained on 100B tokens from HPLT v2 dataset
🌍 Cover EU langs + others
⚙️ Based on LLaMA, trained on #LUMI
📈 Useful for evaluation
Downloads + more info at openeurollm.eu/blog/hplt-oe...
⚙️Trained on 100B tokens from HPLT v2 dataset
🌍 Cover EU langs + others
⚙️ Based on LLaMA, trained on #LUMI
📈 Useful for evaluation
Downloads + more info at openeurollm.eu/blog/hplt-oe...
Part of the [**MaLA Corpus**](huggingface.co/collections/...), deduplicated dataset from [OPUS](opus.nlpl.eu) (cutoff Oct 2024) features **16,829 language pairs** with deduplication, normalization, and noise filtering
Part of the [**MaLA Corpus**](huggingface.co/collections/...), deduplicated dataset from [OPUS](opus.nlpl.eu) (cutoff Oct 2024) features **16,829 language pairs** with deduplication, normalization, and noise filtering
- Ayodele Awokoya
- Wilker Aziz
- Marta Costa-Jussa
- Barry Haddow
- Amit Moryosse
- Sara Papi
- Jörg Tiedemann
- Marco Turchi
- Ayodele Awokoya
- Wilker Aziz
- Marta Costa-Jussa
- Barry Haddow
- Amit Moryosse
- Sara Papi
- Jörg Tiedemann
- Marco Turchi
See here: www.nodalida-bhlt2025.eu/proceedings
See you also soon in Tallinn!
#NLP #NLProc #nodalida #baltichlt
See here: www.nodalida-bhlt2025.eu/proceedings
See you also soon in Tallinn!
#NLP #NLProc #nodalida #baltichlt
OPEN = open-source
Euro = under EU regulations, representing EU values
LLM = LLMs
openeurollm.eu
OPEN = open-source
Euro = under EU regulations, representing EU values
LLM = LLMs
openeurollm.eu