Jean Barré
@jbarre.bsky.social
PhD student @ École Normale Supérieure in Paris. Working in the Computational Literary Studies field on literary evolution of novel subgenres & canonization process + fr-BookNLP implementation w/ @labolattice.bsky.social
https://crazyjeannot.github.io/
https://crazyjeannot.github.io/
Pinned
What happens when we model the detective archetype at scale? 🕵️♂️📚
Our new paper, accepted for #CHR2025 combines literary history and computational modeling to trace how the figure of the detective evolves across 150 years of French fiction.
arxiv.org/pdf/2511.00627
Our new paper, accepted for #CHR2025 combines literary history and computational modeling to trace how the figure of the detective evolves across 150 years of French fiction.
arxiv.org/pdf/2511.00627
Reposted by Jean Barré
What happens when we model the detective archetype at scale? 🕵️♂️📚
Our new paper, accepted for #CHR2025 combines literary history and computational modeling to trace how the figure of the detective evolves across 150 years of French fiction.
arxiv.org/pdf/2511.00627
Our new paper, accepted for #CHR2025 combines literary history and computational modeling to trace how the figure of the detective evolves across 150 years of French fiction.
arxiv.org/pdf/2511.00627
November 4, 2025 at 5:36 PM
What happens when we model the detective archetype at scale? 🕵️♂️📚
Our new paper, accepted for #CHR2025 combines literary history and computational modeling to trace how the figure of the detective evolves across 150 years of French fiction.
arxiv.org/pdf/2511.00627
Our new paper, accepted for #CHR2025 combines literary history and computational modeling to trace how the figure of the detective evolves across 150 years of French fiction.
arxiv.org/pdf/2511.00627
Reposted by Jean Barré
Wrote a short piece arguing that higher ed must help steer AI. TLDR: If we outsource this to tech, we outsource our whole business. But rejectionism is basically stalling. If we want to survive, schools themselves must proactively shape AI for education & research. [1/6, unpaywalled at 5/6] +
Opinion | AI Is the Future. Higher Ed Should Shape It.
If we want to stay at the forefront of knowledge production, we must fit technology to our needs.
www.chronicle.com
November 4, 2025 at 7:55 PM
Wrote a short piece arguing that higher ed must help steer AI. TLDR: If we outsource this to tech, we outsource our whole business. But rejectionism is basically stalling. If we want to survive, schools themselves must proactively shape AI for education & research. [1/6, unpaywalled at 5/6] +
What happens when we model the detective archetype at scale? 🕵️♂️📚
Our new paper, accepted for #CHR2025 combines literary history and computational modeling to trace how the figure of the detective evolves across 150 years of French fiction.
arxiv.org/pdf/2511.00627
Our new paper, accepted for #CHR2025 combines literary history and computational modeling to trace how the figure of the detective evolves across 150 years of French fiction.
arxiv.org/pdf/2511.00627
November 4, 2025 at 5:36 PM
What happens when we model the detective archetype at scale? 🕵️♂️📚
Our new paper, accepted for #CHR2025 combines literary history and computational modeling to trace how the figure of the detective evolves across 150 years of French fiction.
arxiv.org/pdf/2511.00627
Our new paper, accepted for #CHR2025 combines literary history and computational modeling to trace how the figure of the detective evolves across 150 years of French fiction.
arxiv.org/pdf/2511.00627
Reposted by Jean Barré
Awesome! Our new paper in #dhq has just been published!
It discusses three measures of #keyness (or #distinctiveness) when applied to #subgenres of the #French #novel.
The twist is that we perform a #qualitative #evaluation of the measures by relating each list […]
[Original post on fedihum.org]
It discusses three measures of #keyness (or #distinctiveness) when applied to #subgenres of the #French #novel.
The twist is that we perform a #qualitative #evaluation of the measures by relating each list […]
[Original post on fedihum.org]
November 4, 2025 at 7:02 AM
Awesome! Our new paper in #dhq has just been published!
It discusses three measures of #keyness (or #distinctiveness) when applied to #subgenres of the #French #novel.
The twist is that we perform a #qualitative #evaluation of the measures by relating each list […]
[Original post on fedihum.org]
It discusses three measures of #keyness (or #distinctiveness) when applied to #subgenres of the #French #novel.
The twist is that we perform a #qualitative #evaluation of the measures by relating each list […]
[Original post on fedihum.org]
Reposted by Jean Barré
It's been brewing for months: @inriaparisnlp.bsky.social releases CoMMA (Corpus of Multilingual Medieval Archives) !
📚 2.5bn tokens of mostly Latin and French texts
🕰️ 800→1600 CE
📜 23k manuscripts
🖥️ 18k on the reading interface: comma.inria.fr
🔍 Paper: inria.hal.science/hal-05299220v1
(1/🧵)
📚 2.5bn tokens of mostly Latin and French texts
🕰️ 800→1600 CE
📜 23k manuscripts
🖥️ 18k on the reading interface: comma.inria.fr
🔍 Paper: inria.hal.science/hal-05299220v1
(1/🧵)
CoMMA
comma.inria.fr
October 15, 2025 at 2:51 PM
It's been brewing for months: @inriaparisnlp.bsky.social releases CoMMA (Corpus of Multilingual Medieval Archives) !
📚 2.5bn tokens of mostly Latin and French texts
🕰️ 800→1600 CE
📜 23k manuscripts
🖥️ 18k on the reading interface: comma.inria.fr
🔍 Paper: inria.hal.science/hal-05299220v1
(1/🧵)
📚 2.5bn tokens of mostly Latin and French texts
🕰️ 800→1600 CE
📜 23k manuscripts
🖥️ 18k on the reading interface: comma.inria.fr
🔍 Paper: inria.hal.science/hal-05299220v1
(1/🧵)
Reposted by Jean Barré
And new paper out: Pleias 1.0: the First Family of Language Models Trained on Fully Open Data
How we train an open everything model on a new pretraining environment with releasable data (Common Corpus) with an open source framework (Nanotron from HuggingFace).
www.sciencedirect.com/science/arti...
How we train an open everything model on a new pretraining environment with releasable data (Common Corpus) with an open source framework (Nanotron from HuggingFace).
www.sciencedirect.com/science/arti...
September 27, 2025 at 11:44 AM
And new paper out: Pleias 1.0: the First Family of Language Models Trained on Fully Open Data
How we train an open everything model on a new pretraining environment with releasable data (Common Corpus) with an open source framework (Nanotron from HuggingFace).
www.sciencedirect.com/science/arti...
How we train an open everything model on a new pretraining environment with releasable data (Common Corpus) with an open source framework (Nanotron from HuggingFace).
www.sciencedirect.com/science/arti...
Reposted by Jean Barré
We're officially launching the new PSL CultureLab in 10 days !
If you're interested in the research of a collective bridging Computational Humanities, Social Sciences and Cultural Evolution, you can check our programme (and come to our event, if you're in Paris 22 September):
psl.eu/agenda/collo...
If you're interested in the research of a collective bridging Computational Humanities, Social Sciences and Cultural Evolution, you can check our programme (and come to our event, if you're in Paris 22 September):
psl.eu/agenda/collo...
Colloque inaugural du Grand programme de recherche CultureLab | PSL
Recherche, CultureLab inaugure ses travaux le 22 septembre 2025 au Campus Condorcet avec une journée consacrée aux sciences humaines et sociales computationnelles et à l’évolution culturelle. , Le Gra...
psl.eu
September 12, 2025 at 3:09 PM
We're officially launching the new PSL CultureLab in 10 days !
If you're interested in the research of a collective bridging Computational Humanities, Social Sciences and Cultural Evolution, you can check our programme (and come to our event, if you're in Paris 22 September):
psl.eu/agenda/collo...
If you're interested in the research of a collective bridging Computational Humanities, Social Sciences and Cultural Evolution, you can check our programme (and come to our event, if you're in Paris 22 September):
psl.eu/agenda/collo...
Reposted by Jean Barré
✍️ Our paper is finally out!
All poetic forms come from somewhere, but figuring out their relationships is hard.
We use sequence alignment on scansion (010.10) to measure metrical similarity between poems. This allows us to detect related forms across languages and times 1/
tinyurl.com/metronome25
All poetic forms come from somewhere, but figuring out their relationships is hard.
We use sequence alignment on scansion (010.10) to measure metrical similarity between poems. This allows us to detect related forms across languages and times 1/
tinyurl.com/metronome25
Metronome: tracing variation in poetic meters via local sequence alignment | Computational Humanities Research | Cambridge Core
Metronome: tracing variation in poetic meters via local sequence alignment - Volume 1
www.cambridge.org
June 26, 2025 at 10:02 AM
✍️ Our paper is finally out!
All poetic forms come from somewhere, but figuring out their relationships is hard.
We use sequence alignment on scansion (010.10) to measure metrical similarity between poems. This allows us to detect related forms across languages and times 1/
tinyurl.com/metronome25
All poetic forms come from somewhere, but figuring out their relationships is hard.
We use sequence alignment on scansion (010.10) to measure metrical similarity between poems. This allows us to detect related forms across languages and times 1/
tinyurl.com/metronome25
Reposted by Jean Barré
New this morning, a Comment I contributed to Nature Computational Science on the interaction between large language models and the humanities. 🧪 🤖 #MLSky
rdcu.be/etk07
The link above will be open-access for a month — plus, I'll reply to this post with a link to a permanently open preprint. +
rdcu.be/etk07
The link above will be open-access for a month — plus, I'll reply to this post with a link to a permanently open preprint. +
The impact of language models on the humanities and vice versa
Nature Computational Science - Many humanists are skeptical of language models and concerned about their effects on universities. However, researchers with a background in the humanities are also...
rdcu.be
June 25, 2025 at 12:58 PM
New this morning, a Comment I contributed to Nature Computational Science on the interaction between large language models and the humanities. 🧪 🤖 #MLSky
rdcu.be/etk07
The link above will be open-access for a month — plus, I'll reply to this post with a link to a permanently open preprint. +
rdcu.be/etk07
The link above will be open-access for a month — plus, I'll reply to this post with a link to a permanently open preprint. +
I had fun presenting some of my PhD obsessions about the french detective novel in Würzburg.
Thank you @fotisjannidis.bsky.social for the invitation ! The whole team is impressive, brand new building and talented people, the future of DH is actually here 🤩
Thank you @fotisjannidis.bsky.social for the invitation ! The whole team is impressive, brand new building and talented people, the future of DH is actually here 🤩
June 16, 2025 at 9:39 AM
I had fun presenting some of my PhD obsessions about the french detective novel in Würzburg.
Thank you @fotisjannidis.bsky.social for the invitation ! The whole team is impressive, brand new building and talented people, the future of DH is actually here 🤩
Thank you @fotisjannidis.bsky.social for the invitation ! The whole team is impressive, brand new building and talented people, the future of DH is actually here 🤩
Reposted by Jean Barré
"Tell, Don't Show" was accepted to #ACL2025 Findings!
Our conceptually intuitive, lightweight approach for literary topic modeling combines the new (language models) with the old (classic LDA) to yield better topics. ✨📚 arxiv.org/abs/2505.23166
Our conceptually intuitive, lightweight approach for literary topic modeling combines the new (language models) with the old (classic LDA) to yield better topics. ✨📚 arxiv.org/abs/2505.23166
May 30, 2025 at 2:12 AM
"Tell, Don't Show" was accepted to #ACL2025 Findings!
Our conceptually intuitive, lightweight approach for literary topic modeling combines the new (language models) with the old (classic LDA) to yield better topics. ✨📚 arxiv.org/abs/2505.23166
Our conceptually intuitive, lightweight approach for literary topic modeling combines the new (language models) with the old (classic LDA) to yield better topics. ✨📚 arxiv.org/abs/2505.23166
New little paper “The times are a-changin’: présent vs passé simple in French novels (1811–2024)”👉 hal.science/hal-04984105
With Simon Gabay and @floriancafiero.bsky.social
#dhbenelux2025
In french fiction, use of past tenses (especially the passé simple) collapsed over the last 150 years.. so why?
With Simon Gabay and @floriancafiero.bsky.social
#dhbenelux2025
In french fiction, use of past tenses (especially the passé simple) collapsed over the last 150 years.. so why?
The times are a-changin': présent vs passé simple in French novels (1811-2024)
The use of présent and passé simple in French has undergone profound changes in recent centuries. By means of a large corpus of novels, we observe major trends that we attempt to describe and explain....
hal.science
May 6, 2025 at 5:26 PM
New little paper “The times are a-changin’: présent vs passé simple in French novels (1811–2024)”👉 hal.science/hal-04984105
With Simon Gabay and @floriancafiero.bsky.social
#dhbenelux2025
In french fiction, use of past tenses (especially the passé simple) collapsed over the last 150 years.. so why?
With Simon Gabay and @floriancafiero.bsky.social
#dhbenelux2025
In french fiction, use of past tenses (especially the passé simple) collapsed over the last 150 years.. so why?
Reposted by Jean Barré
🚨New pre-print 🚨
News articles often convey different things in text vs. image. Recent work in computational framing analysis has analysed the article text but the corresponding images in those articles have been overlooked.
We propose multi-modal framing analysis of news: arxiv.org/abs/2503.20960
News articles often convey different things in text vs. image. Recent work in computational framing analysis has analysed the article text but the corresponding images in those articles have been overlooked.
We propose multi-modal framing analysis of news: arxiv.org/abs/2503.20960
April 7, 2025 at 9:20 AM
🚨New pre-print 🚨
News articles often convey different things in text vs. image. Recent work in computational framing analysis has analysed the article text but the corresponding images in those articles have been overlooked.
We propose multi-modal framing analysis of news: arxiv.org/abs/2503.20960
News articles often convey different things in text vs. image. Recent work in computational framing analysis has analysed the article text but the corresponding images in those articles have been overlooked.
We propose multi-modal framing analysis of news: arxiv.org/abs/2503.20960
Reposted by Jean Barré
🚨 Our Call for Papers is out! 🚨
We continue our tradition of providing a dedicated platform for presenting computational work that bridges formal methods and traditional inquiry in the arts and humanities.
Check out the website for all details: 2025.computational-humanities-research.org/cfp/
We continue our tradition of providing a dedicated platform for presenting computational work that bridges formal methods and traditional inquiry in the arts and humanities.
Check out the website for all details: 2025.computational-humanities-research.org/cfp/
March 26, 2025 at 1:36 PM
🚨 Our Call for Papers is out! 🚨
We continue our tradition of providing a dedicated platform for presenting computational work that bridges formal methods and traditional inquiry in the arts and humanities.
Check out the website for all details: 2025.computational-humanities-research.org/cfp/
We continue our tradition of providing a dedicated platform for presenting computational work that bridges formal methods and traditional inquiry in the arts and humanities.
Check out the website for all details: 2025.computational-humanities-research.org/cfp/
Reposted by Jean Barré
Le Monde reporting that a French scientist traveling to Houston to attend a conference was denied entry to US after a search of his phone & computer revealed messages critical of Trump's science cuts, "which [says CPB] conveyed hatred of Trump & could be qualified as terrorism". Computer confiscated
March 19, 2025 at 6:11 PM
Le Monde reporting that a French scientist traveling to Houston to attend a conference was denied entry to US after a search of his phone & computer revealed messages critical of Trump's science cuts, "which [says CPB] conveyed hatred of Trump & could be qualified as terrorism". Computer confiscated
Reposted by Jean Barré
Excited to share our preprint "Provocations from the Humanities for Generative AI Research”
We're open to feedback—read & share thoughts!
@laurenfklein.bsky.social @mmvty.bsky.social @docdre.distributedblackness.net @mariaa.bsky.social @jmjafrx.bsky.social @nolauren.bsky.social @dmimno.bsky.social
We're open to feedback—read & share thoughts!
@laurenfklein.bsky.social @mmvty.bsky.social @docdre.distributedblackness.net @mariaa.bsky.social @jmjafrx.bsky.social @nolauren.bsky.social @dmimno.bsky.social
February 28, 2025 at 1:34 AM
Excited to share our preprint "Provocations from the Humanities for Generative AI Research”
We're open to feedback—read & share thoughts!
@laurenfklein.bsky.social @mmvty.bsky.social @docdre.distributedblackness.net @mariaa.bsky.social @jmjafrx.bsky.social @nolauren.bsky.social @dmimno.bsky.social
We're open to feedback—read & share thoughts!
@laurenfklein.bsky.social @mmvty.bsky.social @docdre.distributedblackness.net @mariaa.bsky.social @jmjafrx.bsky.social @nolauren.bsky.social @dmimno.bsky.social
Reposted by Jean Barré
New cultural evolution modelling paper with @bdecourson.bsky.social on @pnas.org!
"Weak individual preferences stabilize culture"
A quick 🧵
www.pnas.org/doi/10.1073/...
"Weak individual preferences stabilize culture"
A quick 🧵
www.pnas.org/doi/10.1073/...
February 21, 2025 at 5:24 PM
New cultural evolution modelling paper with @bdecourson.bsky.social on @pnas.org!
"Weak individual preferences stabilize culture"
A quick 🧵
www.pnas.org/doi/10.1073/...
"Weak individual preferences stabilize culture"
A quick 🧵
www.pnas.org/doi/10.1073/...
Reposted by Jean Barré
Change over time is often depicted as a trendline. But what does shape a trendline? Which forces? Our new paper presents a method allowing to “decompose” trendlines into constituent forces. Also, we tackle an old puzzle: Does culture change “one funeral at a time”? 🧵(1/8) doi.org/10.1098/rspb...
February 5, 2025 at 2:53 PM
Change over time is often depicted as a trendline. But what does shape a trendline? Which forces? Our new paper presents a method allowing to “decompose” trendlines into constituent forces. Also, we tackle an old puzzle: Does culture change “one funeral at a time”? 🧵(1/8) doi.org/10.1098/rspb...
Reposted by Jean Barré
[clearing throat]
February 3, 2025 at 12:58 PM
[clearing throat]
Reposted by Jean Barré
My video on spaCy layout is now out! This is probably my favorite update from @explosion-ai.bsky.social (and that's saying something!) This package makes it simple to do region detection, table detection, and OCR with just 1 line of Python.
Video: youtu.be/quJtzVxoMtE
#MachineLearning
Video: youtu.be/quJtzVxoMtE
#MachineLearning
Best Way to OCR a PDF in Python - spaCy Layout
YouTube video by Python Tutorials for Digital Humanities
youtu.be
January 14, 2025 at 12:42 PM
My video on spaCy layout is now out! This is probably my favorite update from @explosion-ai.bsky.social (and that's saying something!) This package makes it simple to do region detection, table detection, and OCR with just 1 line of Python.
Video: youtu.be/quJtzVxoMtE
#MachineLearning
Video: youtu.be/quJtzVxoMtE
#MachineLearning
Reposted by Jean Barré
I'll get straight to the point.
We trained 2 new models. Like BERT, but modern. ModernBERT.
Not some hypey GenAI thing, but a proper workhorse model, for retrieval, classification, etc. Real practical stuff.
It's much faster, more accurate, longer context, and more useful. 🧵
We trained 2 new models. Like BERT, but modern. ModernBERT.
Not some hypey GenAI thing, but a proper workhorse model, for retrieval, classification, etc. Real practical stuff.
It's much faster, more accurate, longer context, and more useful. 🧵
December 19, 2024 at 4:45 PM
I'll get straight to the point.
We trained 2 new models. Like BERT, but modern. ModernBERT.
Not some hypey GenAI thing, but a proper workhorse model, for retrieval, classification, etc. Real practical stuff.
It's much faster, more accurate, longer context, and more useful. 🧵
We trained 2 new models. Like BERT, but modern. ModernBERT.
Not some hypey GenAI thing, but a proper workhorse model, for retrieval, classification, etc. Real practical stuff.
It's much faster, more accurate, longer context, and more useful. 🧵
Reposted by Jean Barré
“They said it could not be done”. We’re releasing Pleias 1.0, the first suite of models trained on open data (either permissibly licensed or uncopyrighted): Pleias-3b, Pleias-1b and Pleias-350m, all based on the two trillion tokens set from Common Corpus.
December 5, 2024 at 4:39 PM
“They said it could not be done”. We’re releasing Pleias 1.0, the first suite of models trained on open data (either permissibly licensed or uncopyrighted): Pleias-3b, Pleias-1b and Pleias-350m, all based on the two trillion tokens set from Common Corpus.
What a session 😍! thank you @mariaa.bsky.social for the thread !
Time for the LLMs session at #chr2024!
"Remember to Forget: A Study on Verbatim Memorization of Literature in Large Language Models" presented by Olga Seminck
They replicated @kentkchang.bsky.social's name cloze task for books memorization, using English and French books.
"Remember to Forget: A Study on Verbatim Memorization of Literature in Large Language Models" presented by Olga Seminck
They replicated @kentkchang.bsky.social's name cloze task for books memorization, using English and French books.
Computational Humanities Research 2024
2024.computational-humanities-research.org
December 5, 2024 at 2:46 PM
What a session 😍! thank you @mariaa.bsky.social for the thread !
Reposted by Jean Barré
Now @jbarre.bsky.social speaking about "Latent Structures of Intertextuality in French Fiction."
This paper explores genre classification and text similarity over time.
One clear finding: canonical novels are more similar to what comes after them, compared to non-canonical novels.
#chr2024
This paper explores genre classification and text similarity over time.
One clear finding: canonical novels are more similar to what comes after them, compared to non-canonical novels.
#chr2024
December 4, 2024 at 3:27 PM
Now @jbarre.bsky.social speaking about "Latent Structures of Intertextuality in French Fiction."
This paper explores genre classification and text similarity over time.
One clear finding: canonical novels are more similar to what comes after them, compared to non-canonical novels.
#chr2024
This paper explores genre classification and text similarity over time.
One clear finding: canonical novels are more similar to what comes after them, compared to non-canonical novels.
#chr2024