banner
ltgoslo.bsky.social
@ltgoslo.bsky.social
The Language Technology Group (LTG) at the University of Oslo, Norway do research on a range of topics in Natural Language Processing (NLP), including language modeling for Norwegian and other languages.
4. #BabyLM challenge description paper, co-authored by Lucas Georges Gabriel Charpentier

babylm.github.io
babylm.github.io
October 21, 2025 at 3:28 PM
3. "EdinHelsOW WMT 2025 CreoleMT System Description: Improving Lusophone Creole Translation through Data Augmentation, Model Merging and LLM Post-editing" by Jacqueline Rowe, Ona de Gibert, Mateusz Klimaszewski, Coleman Haley, Alexandra Birch and Yves Scherrer
(proc. of WMT)
www2.statmt.org/wmt25/
WMT 2025
www2.statmt.org
October 21, 2025 at 3:28 PM
2. "Improved Norwegian Bokmål Translations for FLORES" by Petter Mæhlum, Anders Næss Evensen and Yves Scherrer
(in proceedings of the WMT 2025 workshop)
www2.statmt.org/wmt25/
WMT 2025
www2.statmt.org
October 21, 2025 at 3:27 PM
1. "Explaining novel senses using definition generation with open language models" by Mariia Fedorova, Andrey Kutuzov, Francesco Periti, Yves Scherrer
(in EMNLP Findings)
arxiv.org/abs/2509.26181
Explaining novel senses using definition generation with open language models
We apply definition generators based on open-weights large language models to the task of creating explanations of novel senses, taking target word usages as an input. To this end, we employ the datas...
arxiv.org
October 21, 2025 at 3:26 PM
5. "Systematic Generalization in Language Models Scales with Information Entropy" by Sondre Wold, Lucas Charpentier, Étienne Simon arxiv.org/abs/2505.13089 (ACL Findings)

See you in Vienna!
(end of 🧵)
Systematic Generalization in Language Models Scales with Information Entropy
Systematic generalization remains challenging for current language models, which are known to be both sensitive to semantically similar permutations of the input and to struggle with known concepts pr...
arxiv.org
June 10, 2025 at 8:26 AM
4. "NorEval: A Norwegian Language Understanding and Generation Evaluation Benchmark" by Vladislav Mikhailov, Tita Enstad, David Samuel, Hans Christian Farsethås, Andrey Kutuzov, Erik Velldal, and Lilja Øvrelid
arxiv.org/abs/2504.07749 (ACL Findings)
NorEval: A Norwegian Language Understanding and Generation Evaluation Benchmark
This paper introduces NorEval, a new and comprehensive evaluation suite for large-scale standardized benchmarking of Norwegian generative language models (LMs). NorEval consists of 24 high-quality hum...
arxiv.org
June 10, 2025 at 8:25 AM
3. "Re-identification of De-identified Documents with Autoregressive Infilling" by Lucas Charpentier and Pierre Lison
arxiv.org/abs/2505.12859 (main ACL)
Re-identification of De-identified Documents with Autoregressive Infilling
Documents revealing sensitive information about individuals must typically be de-identified. This de-identification is often done by masking all mentions of personally identifiable information (PII), ...
arxiv.org
June 10, 2025 at 8:23 AM
2. "Circuit Compositions: Exploring Modular Structures in Transformer-Based Language Models" by Philipp Mondorf, Sondre Wold (LTG), and Barbara Plank
arxiv.org/abs/2410.01434 (main ACL)
Circuit Compositions: Exploring Modular Structures in Transformer-Based Language Models
A fundamental question in interpretability research is to what extent neural networks, particularly language models, implement reusable functions through subnetworks that can be composed to perform mo...
arxiv.org
June 10, 2025 at 8:22 AM
1. "An Expanded Massive Multilingual Dataset for High-Performance Language Technologies (HPLT)". LTG co-authors: Nikolay Arefyev, Mariia Fedorova, Andrey Kutuzov, Petter Mæhlum, Vladislav Mikhailov, Stephan Oepen, David Samuel and many others from hplt-project.org
arxiv.org/abs/2503.10267 (main ACL)
HPLT - High Performance Language Technologies
A space that combines petabytes of natural language data with large-scale model training
hplt-project.org
June 10, 2025 at 8:20 AM
You can find all our papers in the @nodalida.bsky.social proceedings:
dspace.ut.ee/items/5b6a0e...
dspace.ut.ee
March 3, 2025 at 11:38 AM
📄 Multi-label Scandinavian Language Identification (SLIDE), by Fedorova et al.

📄 Interactive maps for corpus-based dialectology, by Scherrer et al.
March 3, 2025 at 11:32 AM
📄 NorEventGen: generative event extraction from Norwegian news, by You et al.

📄 Mixed Feelings: Cross-Domain Sentiment Classification of Patient Feedback, by Rønningstad et al.
March 3, 2025 at 11:31 AM
📄 Large Language Models for Small Languages: A Study of Continual Pretraining on Languages of Norway, by Samuel et al.

📄 Benchmarking Abstractive Summarisation: A Dataset of Human-authored Summaries of Norwegian News Articles, by Touileb et al.
March 3, 2025 at 11:31 AM
📄 A Collection of Question Answering Datasets for Norwegian, by Mikhailov et al.

📄 The Impact of Copyrighted Material on Large Language Models: A Norwegian Perspective, by de la Rosa et al.
March 3, 2025 at 11:30 AM