Institute of Formal and Applied Linguistics
banner
ufal.mff.cuni.cz
Institute of Formal and Applied Linguistics
@ufal.mff.cuni.cz
Computational linguistics • Natural language processing • Formal linguistics • Machine translation | at Faculty of Mathematics and Physics, Charles University
🌍 Towards Adding Arabic to CorefUD
Dima Taji and Dan Zeman
aclanthology.org/2025.crac-1.6
Expanding the CorefUD universal coreference dataset to Arabic - taking important steps toward truly multilingual coreference resolution resources and better Arabic NLP.
November 11, 2025 at 2:37 PM
EMNLP 2025 is over... and Milan Straka is bringing home an award! 🏆
CorPipe triumphed in the prestigious CRAC25 Shared Task, focusing on multilingual coreference resolution.

Did Milan just CRACk it? We certainly think so! 😉

🔗 Find out more at arxiv.org/abs/2509.17858

#EMNLP2025 #CorPipe #CRAC25
November 11, 2025 at 1:49 PM
The EU's 🇪🇺 HPLT project, coordinated by @ufal.mff.cuni.cz is at #EMNLP2025! It has supported it as a silver sponsor, disseminating HPLT results from our booth and through several papers. We'll continue to shape the future of multilingual datasets and models here and in @openeurollm.bsky.social!
November 7, 2025 at 9:03 PM
📚 SRS-Stories: Vocabulary-constrained multilingual story generation for language learning
Wiktor Kamzela, Mateusz Lango & @toonietuesday.bsky.social
aclanthology.org/2025.emnlp-i...
LLM stories teach vocab while reviewing learned words via Spaced Repetition-more grammatical than standard generation
November 7, 2025 at 8:54 PM
Excited to share our work at #EMNLP2025! Our team is presenting 12 papers across the main conference and workshops, covering multilingual NLG, LLM agents, coreference resolution, and machine translation.
A thread with highlights 🧵👇
November 7, 2025 at 8:54 PM
Zveme na dnešní přednášku Jazykovědného sdružení, kterou od 17:30 přednese prof. PhDr. Eva Hajičová, DrSc.

🔗 Můžete přijít osobně nebo sledovat na zoomu: lnkd.in/eQeST-uG

Téma přednášky: Aktuální členění v době paralelních korpusů

📸 Foto: Vladimír Šigut, UK
October 23, 2025 at 9:02 AM
🚀 PROJECT LAUNCH: Infoveillance is Live! Our AI tool monitors digital media to detect misinformation and enhance public trust/literacy. Fighting infodemics & polarization.

[https://ufal.mff.cuni.cz/grants/infoveillance]
#Infoveillance #AI #Misinformation #PublicTrust #UFAL
October 2, 2025 at 11:46 AM
Šest kolegů vedlo pro DGT (Evropské ředitelství pro překlady) třídenní letní školu v Lucemburku. Učili 40+ pracovníků DGT nejnovější metody strojové podpory překladu a zajištění kvality. Cíl? Zefektivnit překlad legislativy EU do všech členských jazyků!
#DGT #UFAL #StrojovyPreklad #AI #EUTools
September 29, 2025 at 9:31 AM
@Dan Zeman has been invited as a keynote speaker at the ICLC 11 conference! iclc11.ff.cuni.cz/keynote-spea...

#UFAL #ICLC11 #UniversalDependencies #CharlesUniversity #Prague
September 24, 2025 at 7:48 AM
Nahlédněte na kick‐off meeting projektu ✨HumanAId: AI zaměřená na člověka pro udržitelnou a adaptabilní společnost✨.

Projekt se silnou účastí: vede ho FFUK ve spolupráci s MFF UK, FSV UK, PF UP v Olomouci, FÚ AV ČR, prg.ai a Kampusem Hybernská.

#prgAI #HumanAId #OPJAK
1/2
September 23, 2025 at 2:58 PM
And another successfully defended thesis: 👉Dr.👈 Kira Droganova defended her thesis: Dependency Parsing beyond Simple Trees, which focused on enriching syntactic parsing with deeper semantic layers to better capture meaning across languages. Congratulations 🥳
September 23, 2025 at 11:07 AM
🎉 Congratulations to 👉Dr.👈 Tomáš Musil on successfully defending his PhD thesis! 🍻 His talk explored #LLMs, theories of meaning, and their role in LLM #interpretability, highlighting unsupervised discovery of binary semantic features via ICA and the word intruder test.
September 22, 2025 at 10:09 AM
Workshop "Regulace, AI a advokacie – zákulisí legislativy a advokátních inovací" představil OpenEuroLLM jako naději pro evropskou digitální suverenitu a nutnost pro konkurenceschopnost Evropy. Jan Hajič zdůraznil, že Česko se snaží o snižování byrokracie v oblasti AI.

#AI #AIregulation #FutureOfLaw
September 19, 2025 at 2:45 PM
Researchers' Night with @informatfyz.cuni.cz!
You can come to a live podcast recording and try out a real-time automatic interpreting system ELITR. The event is on September 26th.

🔗 czechia.representation.ec.europa.eu/evropsky-den...

#ELITR #AI #Interpreting #MachineTranslation #LanguageTech
September 18, 2025 at 10:19 AM
Gold Data and Multiple Understanding of Discourse Relations
by Š. Zikánová, A. Nedoluzhko, J. Mírovský & E. Hajičová
TL;DR: Investigate how annotators interpret discourse relations differently, revealing important insights about subjectivity in linguistic annotation and its impact on NLP systems.
September 1, 2025 at 2:29 PM
Morphological Segmentation with Neural Networks: Performance Effects of Architecture, Data Size, and Cross-Lingual Transfer in Seven Languages
by M. Olbrich & Z. Zabokrtsky
TL;DR: Analyzed neural architectures, data size, and cross-lingual transfer for morphological segmentation for 7 languages.
September 1, 2025 at 2:29 PM
Flexing in 73 Languages: A Single Small Model for Multilingual Inflection
by Tomáš Sourada & Jana Straková
TL;DR: Compact neural model successfully handles morphological inflection across 73 diverse languages, proving that small can be mighty in multilingual NLP.
September 1, 2025 at 2:29 PM
Refining Czech GEC: Insights from a Multi-Experiment Approach
by P. Pechman, @straka-milan.bsky.social , @janastrakova.bsky.social , J. Náplava
TL;DR: Better Czech grammatical error correction systems + insights for better automated writing assistance in Czech arxiv.org/abs/2506.22402
September 1, 2025 at 2:29 PM
Investigating the Effect of Parallel Data in the Cross-Lingual Transfer for Vision-Language Encoders
by @andrei-a-manea.bsky.social & @jlibovicky.bsky.social
TL;DR: Explore how parallel datasets improve cross-lingual transfer in vision-language models. arxiv.org/abs/2504.21681
September 1, 2025 at 2:29 PM
ParCzech 3.0: A Large Czech Speech Corpus with Rich Metadata
by M. Kopp, V. Stankov, J. O. Krůza, . Straňák & . Bojar
TL;DR: Czech parliamentary speeches from 2013-2021 with rich metadata incl. speaker identities, political affiliations, and automatic linguistic annotations in TEI format.
September 1, 2025 at 2:29 PM
Automated Speaking Assessment for L2 Learners of Czech by Peter Polák, Michal Novák, Kateřina Rysová, Magdaléna Rysová & Ondřej Bojar
TL;DR: An automated system to evaluate the Czech speaking skills of second language learners, making language assessment more accessible and consistent.
September 1, 2025 at 2:29 PM
Great week at #TSD2025 in Erlangen! Our @ufal-cuni.bsky.social team presented 7 papers covering various topics from Czech speech assessment to multilingual morphology. Thanks to all attendees who engaged with our work! 🧵👇
#NLP #ComputationalLinguistics #CzechNLP #MachineLearning
September 1, 2025 at 2:29 PM
Our team of 10 is at #MTMarathon2025 in Helsinki 🇫🇮, a week-long meeting of machine translation researchers, developers.
✅ Posters presented
✅ Now working on cool collaborative projects with researchers from around the world.
#MachineTranslation #NLP
August 26, 2025 at 8:05 AM
We are hosting a summer school „Data Literacy with R for Students of Humanities“ at Malá Strana, August 4-15. ufal.mff.cuni.cz/events/summe...

#DataLiteracy #Humanities #Matfyz #UFAL #CharlesUniversity
August 7, 2025 at 1:11 PM
Atyaephyra at SemEval-2025 Task 4: Low-Rank Negative Preference Optimization
arxiv.org/abs/2503.13690
by Jan Bronec and @jindrahelcl.bsky.social
Negative preference optimization with LoRA for LLM unlearning, using efficient regularization to exceed SemEval 2025 baseline performance.
August 1, 2025 at 1:37 PM