Paul Lerner
banner
lernerp.bsky.social
Paul Lerner
@lernerp.bsky.social
Postdoc @mlia_isir@sciences.re (Sorbonne Université, CNRS, ISIR)
/ Teacher @ aivancity
/ Teacher Assistant @ Sorbonne Université

https://paullerner.github.io/
Reposted by Paul Lerner
Accepted to a Workshop (1/2):

"Self-Retrieval from Distant Contexts for Document-Level Machine Translation", accepted to the Conference on Machine Translation (WMT25), from @ziqianpeng.bsky.social, @rachelbawden.bsky.social, @yvofr.bsky.social
October 28, 2025 at 8:57 AM
There's many directions where this could go, multilingual, low-resource language, interpretability, depending on your profile, and the internship may lead to a PhD, provided we get funding!
November 6, 2025 at 9:07 AM
As we found in aclanthology.org/2025.coling-... that BPE-based LLMs (i.e. pretty much all LLMs) did not handle prefixations well
Unlike “Likely”, “Unlike” is Unlikely: BPE-based Segmentation hurts Morphological Derivations in LLMs
Paul Lerner, François Yvon. Proceedings of the 31st International Conference on Computational Linguistics. 2025.
aclanthology.org
November 6, 2025 at 9:06 AM
Basically the idea is to extend www.pnas.org/doi/10.1073/... to see how well LLMs model competition between affixes, not only suffixes (e.g. -ity vs. -ness) but also prefixes (e.g. un- vs. non-)
Derivational morphology reveals analogical generalization in large language models | PNAS
What mechanisms underlie linguistic generalization in large language models (LLMs)? This question has attracted considerable attention, with most s...
www.pnas.org
November 6, 2025 at 9:04 AM
work done with @yvofr.bsky.social as part of the Democratic Commons programme, many thanks to our colleagues at Make, Sciences Po, and Sorbonne! about.make.org/democratic-c...
Landing Page
about.make.org
October 23, 2025 at 4:16 PM
here's what one example of the dataset looks like, there are 72,234 just like this one (I regret my multimodal days where there were pictures in my papers)
October 23, 2025 at 4:09 PM
I tried for a pythonic library, have a look at the example notebook colab.research.google.com/github/PaulL...
Google Colab
colab.research.google.com
October 15, 2025 at 5:27 PM
🤔 ppllm is benchmarked against:
- a vllm-based implementation: 4.15 times faster!
- a naive hugging face implementation, which does not sort texts by length: 4.61 times faster!
October 15, 2025 at 5:26 PM
🤔 ppllm implements windowed PPL, which allows to compute the PPL of arbitrarily long texts.
It aims to be feature complete for many information-theoretic metrics, including Perplexity (PPL), Surprisal, and bits per character (BPC), and their word-level counterparts.
October 15, 2025 at 5:26 PM
Work done with Laurène Cave,
@haldaume3.bsky.social, Léo Labat, Gaël Lejeune, Pierre-Antoine Lequeu,
@bpiwowar.bsky.social, Nazanin Shafiabadi and @yvofr.bsky.social, read the paper here talnarchives.atala.org/ateliers/202...
Any feedback is appreciated :)
talnarchives.atala.org
July 7, 2025 at 8:02 AM
Reposted by Paul Lerner
For the EALM Workshop
"On Assessing the Political Biases of Multilingual Large Language Models" by @lernerp.bsky.social Laurène Cave, @haldaume3.bsky.social Léo Labat, Gaël Lejeune, Pierre-Antoine Lequeu, @bpiwowar.bsky.social Nazanin Shafiabadi and yvofr.bsky.social, collaborated with the STIH lab
June 10, 2025 at 6:39 PM
CNRS provides plmlatex.math.cnrs.fr that covers most of the features. I guess it's not so complicated to host (the software is open source)
Identifiant
Un éditeur LaTeX en ligne facile à utiliser. Pas d’installation, collaboration en temps réel, gestion des versions, des centaines de modèles de documents LaTeX, et plus encore.
plmlatex.math.cnrs.fr
May 6, 2025 at 1:04 PM