Institute of Formal and Applied Linguistics
banner
ufal.mff.cuni.cz
Institute of Formal and Applied Linguistics
@ufal.mff.cuni.cz
Computational linguistics • Natural language processing • Formal linguistics • Machine translation | at Faculty of Mathematics and Physics, Charles University
📏 CUNI and Phrase at WMT25 MT Evaluation Task
Miroslav Hrabal, Ondrej Glembek, Aleš Tamchyna, Almut Silja Hildebrand, Alan Eckhard, Miroslav Štola, Sergio Penkale, Zuzana Šimečková, Ondřej Bojar, Alon Lavie, Craig Stewart
aclanthology.org/2025.wmt-1.68
CUNI and Phrase at WMT25 MT Evaluation Task
Miroslav Hrabal, Ondrej Glembek, Aleš Tamchyna, Almut Silja Hildebrand, Alan Eckhard, Miroslav Štola, Sergio Penkale, Zuzana Šimečková, Ondřej Bojar, Alon Lavie, Craig Stewart. Proceedings of the Tent...
aclanthology.org
November 11, 2025 at 2:37 PM
🇨🇿 CUNI at WMT25 General Translation Task
Josef Jon, Miroslav Hrabal, Martin Popel, Ondřej Bojar
aclanthology.org/2025.wmt-1.44
Our submission to the WMT25 translation shared task showcases CUNI's latest approaches to general-purpose machine translation across multiple language pairs.
CUNI at WMT25 General Translation Task
Josef Jon, Miroslav Hrabal, Martin Popel, Ondřej Bojar. Proceedings of the Tenth Conference on Machine Translation. 2025.
aclanthology.org
November 11, 2025 at 2:37 PM
🔤 Pretraining Language Models with LoRA and Artificial Languages
Nalin Kumar, Mateusz Lango, @tuetschek.bsky.social t
aclanthology.org/2025.babylm-...
Constructed artificial languages with LoRA affects language model development.
Pretraining Language Models with LoRA and Artificial Languages
Nalin Kumar, Mateusz Lango, Ondrej Dusek. Proceedings of the First BabyLM Workshop. 2025.
aclanthology.org
November 11, 2025 at 2:37 PM
🎓 You are an LLM teaching a smaller model everything you know: Multi-task pretraining of language models with LLM-designed study plans
Wiktor Kamzela, Mateusz Lango, @tuetschek.bsky.social
aclanthology.org/2025.babylm-...
You are an LLM teaching a smaller model everything you know: Multi-task pretraining of language models with LLM-designed study plans
Wiktor Kamzela, Mateusz Lango, Ondrej Dusek. Proceedings of the First BabyLM Workshop. 2025.
aclanthology.org
November 11, 2025 at 2:37 PM
🌍 Towards Adding Arabic to CorefUD
Dima Taji and Dan Zeman
aclanthology.org/2025.crac-1.6
Expanding the CorefUD universal coreference dataset to Arabic - taking important steps toward truly multilingual coreference resolution resources and better Arabic NLP.
November 11, 2025 at 2:37 PM
📊 Real-World Summarization: When Evaluation Reaches Its Limits
@patuchen.bsky.social , @tuetschek.bsky.social , @saad.me.uk
aclanthology.org/2025.finding...
For hotel highlights, metrics like word overlap surprisingly match human judgments better than complex methods. LLMs unreliable as evaluators.
Real-World Summarization: When Evaluation Reaches Its Limits
Patrícia Schmidtová, Ondrej Dusek, Saad Mahamood. Findings of the Association for Computational Linguistics: EMNLP 2025. 2025.
aclanthology.org
November 7, 2025 at 8:54 PM
👥 Can Large Language Models Personalize Dialogues to Generational Styles?
P. Balestrucci, @tuetschek.bsky.social, L. Anselma, A. Mazzei
aclanthology.org/2025.finding...
Can LLMs adapt dialogues to generational styles? We show with P-MultiWoZ that models capture patterns from Boomers to Gen Z.
Can Large Language Models Personalize Dialogues to Generational Styles?
Pier Felice Balestrucci, Ondrej Dusek, Luca Anselma, Alessandro Mazzei. Findings of the Association for Computational Linguistics: EMNLP 2025. 2025.
aclanthology.org
November 7, 2025 at 8:54 PM
🤖 LLM Agents Implement an NLG System from Scratch
Mateusz Lango, Ondrej Dusek
aclanthology.org/2025.emnlp-i...
LLM agents can autonomously build interpretable, rule-based RDF-to-text generators from scratch, combining the LLMs with the transparency and reliability of traditional rule-based systems.
LLM Agents Implement an NLG System from Scratch: Building Interpretable Rule-Based RDF-to-Text Generators
Mateusz Lango, Ondrej Dusek. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track. 2025.
aclanthology.org
November 7, 2025 at 8:54 PM
📚 SRS-Stories: Vocabulary-constrained multilingual story generation for language learning
Wiktor Kamzela, Mateusz Lango & @toonietuesday.bsky.social
aclanthology.org/2025.emnlp-i...
LLM stories teach vocab while reviewing learned words via Spaced Repetition-more grammatical than standard generation
November 7, 2025 at 8:54 PM
The tools developed by Pavel and Jan are used by several oral history collections around the world, including the USC Holocaust testimonies available and the Center for Visual History Malach at UFAL. Their general ASR tools are also available at lindat.mff.cuni.cz/services/uwe....
UWebASR - University of West Bohemia : Automatic Speech Recognition Service
lindat.mff.cuni.cz
October 1, 2025 at 8:43 AM
Letní školu vedli: Ondřej Bojar, Zdeněk Kasner, Tomáš Polák, Dominik Macháček, Miroslav Hrabal a Josef Jon.
September 29, 2025 at 10:05 AM
In his speech titled "Indirect Objects across Languages: A Trap in Universal Dependencies?" he discussed the challenges of the UD framework in relation to traditional language descriptions. He highlighted the ambiguities in UD guidelines and their impact on annotation practices.
September 24, 2025 at 7:49 AM
ufal.mff.cuni.cz/grants/human...
Projekt bude zkoumat, jak mohou velké jazykové modely a další technologie umělé inteligence přispět k demokratickému dialogu, vzdělávání a komunikaci mezi lidmi.

Období realizace: 1. 3.2025 – 31. 12.2028. Financování: OP JAK: Společenské a humanitní vědy 2/2
September 23, 2025 at 3:00 PM