Bastian Bunzeck
@bbunzeck.bsky.social
Computational linguist trying to understand how humans and computers learn and use language 👶🧠🗣️🖥️💬
PhD @clausebielefeld.bsky.social, Bielefeld University
https://bbunzeck.github.io
PhD @clausebielefeld.bsky.social, Bielefeld University
https://bbunzeck.github.io
Reposted by Bastian Bunzeck
Our panel moderated by @danaarad.bsky.social
"Evaluating Interpretability Methods: Challenges and Future Directions" just started! 🎉 Come to learn more about the MIB benchmark and hear the takes of @michaelwhanna.bsky.social, Michal Golovanevsky, Nicolò Brunello and Mingyang Wang!
"Evaluating Interpretability Methods: Challenges and Future Directions" just started! 🎉 Come to learn more about the MIB benchmark and hear the takes of @michaelwhanna.bsky.social, Michal Golovanevsky, Nicolò Brunello and Mingyang Wang!
November 9, 2025 at 6:55 AM
Our panel moderated by @danaarad.bsky.social
"Evaluating Interpretability Methods: Challenges and Future Directions" just started! 🎉 Come to learn more about the MIB benchmark and hear the takes of @michaelwhanna.bsky.social, Michal Golovanevsky, Nicolò Brunello and Mingyang Wang!
"Evaluating Interpretability Methods: Challenges and Future Directions" just started! 🎉 Come to learn more about the MIB benchmark and hear the takes of @michaelwhanna.bsky.social, Michal Golovanevsky, Nicolò Brunello and Mingyang Wang!
Reposted by Bastian Bunzeck
November 7, 2025 at 9:30 AM
Reposted by Bastian Bunzeck
I'm in Suzhou to present our work on MultiBLiMP, Friday @ 11:45 in the Multilinguality session (A301)!
Come check it out if your interested in multilingual linguistic evaluation of LLMs (there will be parse trees on the slides! There's still use for syntactic structure!)
arxiv.org/abs/2504.02768
Come check it out if your interested in multilingual linguistic evaluation of LLMs (there will be parse trees on the slides! There's still use for syntactic structure!)
arxiv.org/abs/2504.02768
November 6, 2025 at 7:08 AM
I'm in Suzhou to present our work on MultiBLiMP, Friday @ 11:45 in the Multilinguality session (A301)!
Come check it out if your interested in multilingual linguistic evaluation of LLMs (there will be parse trees on the slides! There's still use for syntactic structure!)
arxiv.org/abs/2504.02768
Come check it out if your interested in multilingual linguistic evaluation of LLMs (there will be parse trees on the slides! There's still use for syntactic structure!)
arxiv.org/abs/2504.02768
Reposted by Bastian Bunzeck
One of the great mysteries of #language is how it finds a balance between robust stability and endless flexibility. I believe this requires us to rethink #linguistic structures. In this article, I propose dynamic #tensegrity as a novel architectural metaphor
aclanthology.org/2025.cxgsnlp...
aclanthology.org/2025.cxgsnlp...
aclanthology.org
November 4, 2025 at 2:08 PM
One of the great mysteries of #language is how it finds a balance between robust stability and endless flexibility. I believe this requires us to rethink #linguistic structures. In this article, I propose dynamic #tensegrity as a novel architectural metaphor
aclanthology.org/2025.cxgsnlp...
aclanthology.org/2025.cxgsnlp...
As part of this year's BabyLM challenge, we (researchers from @gronlp.bsky.social and @clausebielefeld.bsky.social diverged from established pretraining paradigm by training only on dialogue data from CHILDES.
October 28, 2025 at 12:53 PM
As part of this year's BabyLM challenge, we (researchers from @gronlp.bsky.social and @clausebielefeld.bsky.social diverged from established pretraining paradigm by training only on dialogue data from CHILDES.
Reposted by Bastian Bunzeck
With only a week left for #EMNLP2025, we are happy to announce all the works we 🐮 will present 🥳 - come and say "hi" to our posters and presentations during the Main and the co-located events (*SEM and workshops) See you in Suzhou ✈️
October 27, 2025 at 11:54 AM
With only a week left for #EMNLP2025, we are happy to announce all the works we 🐮 will present 🥳 - come and say "hi" to our posters and presentations during the Main and the co-located events (*SEM and workshops) See you in Suzhou ✈️
Reposted by Bastian Bunzeck
"The capacity for language exists along a continuum [...]. The idea that language development does not require uniquely human properties becomes increasingly important as legal boundaries expand to include nonhuman species."
October 23, 2025 at 8:49 PM
"The capacity for language exists along a continuum [...]. The idea that language development does not require uniquely human properties becomes increasingly important as legal boundaries expand to include nonhuman species."
Reposted by Bastian Bunzeck
🌍Introducing BabyBabelLM: A Multilingual Benchmark of Developmentally Plausible Training Data!
LLMs learn from vastly more data than humans ever experience. BabyLM challenges this paradigm by focusing on developmentally plausible data
We extend this effort to 45 new languages!
LLMs learn from vastly more data than humans ever experience. BabyLM challenges this paradigm by focusing on developmentally plausible data
We extend this effort to 45 new languages!
October 15, 2025 at 10:53 AM
🌍Introducing BabyBabelLM: A Multilingual Benchmark of Developmentally Plausible Training Data!
LLMs learn from vastly more data than humans ever experience. BabyLM challenges this paradigm by focusing on developmentally plausible data
We extend this effort to 45 new languages!
LLMs learn from vastly more data than humans ever experience. BabyLM challenges this paradigm by focusing on developmentally plausible data
We extend this effort to 45 new languages!
Preprint alert! We release BabyBabelLM, a multilingual benchmark of developmentally plausible training data. I was responsible for German and Polish data as well as various child-directed wikis. Immensely rewarding project with exceptionally cool co-authors. 🥳🚀
𝐃𝐨 𝐲𝐨𝐮 𝐫𝐞𝐚𝐥𝐥𝐲 𝐰𝐚𝐧𝐭 𝐭𝐨 𝐬𝐞𝐞 𝐰𝐡𝐚𝐭 𝐦𝐮𝐥𝐭𝐢𝐥𝐢𝐧𝐠𝐮𝐚𝐥 𝐞𝐟𝐟𝐨𝐫𝐭 𝐥𝐨𝐨𝐤𝐬 𝐥𝐢𝐤𝐞? 🇨🇳🇮🇩🇸🇪
Here’s the proof! 𝐁𝐚𝐛𝐲𝐁𝐚𝐛𝐞𝐥𝐋𝐌 is the first Multilingual Benchmark of Developmentally Plausible Training Data available for 45 languages to the NLP community 🎉
arxiv.org/abs/2510.10159
Here’s the proof! 𝐁𝐚𝐛𝐲𝐁𝐚𝐛𝐞𝐥𝐋𝐌 is the first Multilingual Benchmark of Developmentally Plausible Training Data available for 45 languages to the NLP community 🎉
arxiv.org/abs/2510.10159
October 14, 2025 at 5:19 PM
Preprint alert! We release BabyBabelLM, a multilingual benchmark of developmentally plausible training data. I was responsible for German and Polish data as well as various child-directed wikis. Immensely rewarding project with exceptionally cool co-authors. 🥳🚀
Reposted by Bastian Bunzeck
Keynote at #COLM2025: Nicholas Carlini from Anthropic
"Are language models worth it?"
Explains that the prior decade of his work on adversarial images, while it taught us a lot, isn't very applied; it's unlikely anyone is actually altering images of cats in scary ways.
"Are language models worth it?"
Explains that the prior decade of his work on adversarial images, while it taught us a lot, isn't very applied; it's unlikely anyone is actually altering images of cats in scary ways.
October 9, 2025 at 1:12 PM
Keynote at #COLM2025: Nicholas Carlini from Anthropic
"Are language models worth it?"
Explains that the prior decade of his work on adversarial images, while it taught us a lot, isn't very applied; it's unlikely anyone is actually altering images of cats in scary ways.
"Are language models worth it?"
Explains that the prior decade of his work on adversarial images, while it taught us a lot, isn't very applied; it's unlikely anyone is actually altering images of cats in scary ways.
Reposted by Bastian Bunzeck
Huge congrats to the envisionBOX team for the Open Science award nomination! 🎉
My tutorial on speech analysis tools in Python from the Unboxing Multimodality summer school (github.com/mdhk/unboxin...) is now also available at envisionbox.org
Thanks for the invitation & this great initiative! 👏
My tutorial on speech analysis tools in Python from the Unboxing Multimodality summer school (github.com/mdhk/unboxin...) is now also available at envisionbox.org
Thanks for the invitation & this great initiative! 👏
www.envisionbox.org has been shortlisted for the Leo Waaijers Open Science price: ukb.nl/en/news/shor...
@babajideowoyele.bsky.social @jamestrujillo.bsky.social @sarkadava.bsky.social @DavideAhmar @acwiek.bsky.social
Amazing Markus Küpper made an animated video:
www.youtube.com/watch?v=HduI...
@babajideowoyele.bsky.social @jamestrujillo.bsky.social @sarkadava.bsky.social @DavideAhmar @acwiek.bsky.social
Amazing Markus Küpper made an animated video:
www.youtube.com/watch?v=HduI...
EnvisionBOX overview2025
YouTube video by Wim Pouw
www.youtube.com
October 2, 2025 at 5:18 PM
Huge congrats to the envisionBOX team for the Open Science award nomination! 🎉
My tutorial on speech analysis tools in Python from the Unboxing Multimodality summer school (github.com/mdhk/unboxin...) is now also available at envisionbox.org
Thanks for the invitation & this great initiative! 👏
My tutorial on speech analysis tools in Python from the Unboxing Multimodality summer school (github.com/mdhk/unboxin...) is now also available at envisionbox.org
Thanks for the invitation & this great initiative! 👏
Reposted by Bastian Bunzeck
Gentle reminder that the #CfP for #Evolang2026 @evolangconf.bsky.social is still open - deadline October 26! sites.google.com/york.ac.uk/e...
EVOLANG 2026 - Call for Papers
sites.google.com
October 2, 2025 at 11:32 AM
Gentle reminder that the #CfP for #Evolang2026 @evolangconf.bsky.social is still open - deadline October 26! sites.google.com/york.ac.uk/e...
Reposted by Bastian Bunzeck
What's the right unit of analysis for understanding LLM internals? We explore in our mech interp survey (a major update from our 2024 ms).
We’ve added more recent work and more immediately actionable directions for future work. Now published in Computational Linguistics!
We’ve added more recent work and more immediately actionable directions for future work. Now published in Computational Linguistics!
October 1, 2025 at 2:03 PM
What's the right unit of analysis for understanding LLM internals? We explore in our mech interp survey (a major update from our 2024 ms).
We’ve added more recent work and more immediately actionable directions for future work. Now published in Computational Linguistics!
We’ve added more recent work and more immediately actionable directions for future work. Now published in Computational Linguistics!
Reposted by Bastian Bunzeck
New paper! 🚨 I argue that LLMs represent a synthesis between distributed and symbolic approaches to language, because, when exposed to language, they develop highly symbolic representations and processing mechanisms in addition to distributed ones.
arxiv.org/abs/2502.11856
arxiv.org/abs/2502.11856
September 30, 2025 at 1:16 PM
New paper! 🚨 I argue that LLMs represent a synthesis between distributed and symbolic approaches to language, because, when exposed to language, they develop highly symbolic representations and processing mechanisms in addition to distributed ones.
arxiv.org/abs/2502.11856
arxiv.org/abs/2502.11856
Reposted by Bastian Bunzeck
Many AI researchers draw inspiration from neuroscience. Naomi Saphra favors a different analogy. Interpretability, in her view, should take a cue from evolutionary biology.
To Understand AI, Watch How It Evolves | Quanta Magazine
Naomi Saphra thinks that most research into language models focuses too much on the finished product. She’s mining the history of their training for insights into why these systems work the way they…
www.quantamagazine.org
September 29, 2025 at 8:04 PM
Many AI researchers draw inspiration from neuroscience. Naomi Saphra favors a different analogy. Interpretability, in her view, should take a cue from evolutionary biology.
My very first book review is out now 📚
Muchas gracias to @stefanhartmann.bsky.social for inviting me, looking forward to our next project(s) 😇
Muchas gracias to @stefanhartmann.bsky.social for inviting me, looking forward to our next project(s) 😇
It's been a while since I've written a book review - here's our review of Herbst & Hoffmann (2024), my first but definitely not last collaboration with the brilliant @bbunzeck.bsky.social doi.org/10.1017/S136... ($)
Thomas Herbst and Thomas Hoffmann, A Construction Grammar of the English language: CASA – a constructionist approach to syntactic analysis (Cognitive Linguistics in Practice 5). Amsterdam and Philadel...
Thomas Herbst and Thomas Hoffmann, A Construction Grammar of the English language: CASA – a constructionist approach to syntactic analysis (Cognitive Linguistics in Practice 5). Amsterdam and Philadel...
doi.org
September 26, 2025 at 9:43 AM
My very first book review is out now 📚
Muchas gracias to @stefanhartmann.bsky.social for inviting me, looking forward to our next project(s) 😇
Muchas gracias to @stefanhartmann.bsky.social for inviting me, looking forward to our next project(s) 😇
Reposted by Bastian Bunzeck
I'm conducting research on how ACL's peer review policies impact NLP research quality, career trajectories, and inclusivity within our community. I am running a survey, which would take around 7-10 mins to complete: forms.cloud.microsoft/e/j2jr9nH3X0
I would really appreciate insights from y'all!
I would really appreciate insights from y'all!
September 25, 2025 at 2:23 PM
I'm conducting research on how ACL's peer review policies impact NLP research quality, career trajectories, and inclusivity within our community. I am running a survey, which would take around 7-10 mins to complete: forms.cloud.microsoft/e/j2jr9nH3X0
I would really appreciate insights from y'all!
I would really appreciate insights from y'all!
Reposted by Bastian Bunzeck
🚨 Are you looking for a PhD in #NLProc dealing with #LLMs?
🎉 Good news: I am hiring! 🎉
The position is part of the “Contested Climate Futures" project. 🌱🌍 You will focus on developing next-generation AI methods🤖 to analyze climate-related concepts in content—including texts, images, and videos.
🎉 Good news: I am hiring! 🎉
The position is part of the “Contested Climate Futures" project. 🌱🌍 You will focus on developing next-generation AI methods🤖 to analyze climate-related concepts in content—including texts, images, and videos.
September 24, 2025 at 7:34 AM
Reposted by Bastian Bunzeck
Attending the The Second International Workshop on Construction Grammars and NLP (CxGs+NLP 2025) in Düsseldorf, Germany? Check out the poster “Do Construction Distributions Shape Formal Language Learning In German BabyLMs?” by Bastian Bunzeck and colleagues! @bbunzeck.bsky.social #CRC1646 #LINCC
September 23, 2025 at 10:16 AM
Attending the The Second International Workshop on Construction Grammars and NLP (CxGs+NLP 2025) in Düsseldorf, Germany? Check out the poster “Do Construction Distributions Shape Formal Language Learning In German BabyLMs?” by Bastian Bunzeck and colleagues! @bbunzeck.bsky.social #CRC1646 #LINCC
From conference to conference: September ends with a trip to #IWCS in beautiful Düsseldorf. Hyped for two days of semantics (and two more days of construction grammar and NLP). 🥳
September 22, 2025 at 7:51 AM
From conference to conference: September ends with a trip to #IWCS in beautiful Düsseldorf. Hyped for two days of semantics (and two more days of construction grammar and NLP). 🥳
Reposted by Bastian Bunzeck
The first of the three corpora of German-English bilingual children's early speech that we've been working on for the last few years is finally publicly available! 🥳 🎉 talkbank.org/childes/acce...
CHILDES English-German MPI-EVA-Leipzig Corpus
talkbank.org
September 19, 2025 at 5:48 AM
The first of the three corpora of German-English bilingual children's early speech that we've been working on for the last few years is finally publicly available! 🥳 🎉 talkbank.org/childes/acce...
Reposted by Bastian Bunzeck
“Developmentally plausible pretraining, now also auf Deutsch: a BabyLM Dataset for German” — Today I had the pleasure to present our German BabyLM dataset together with the first author Bastian Bunzeck @bbunzeck.bsky.social to an interested and engaging audience at #KONVENS2025 in Hildesheim.
September 12, 2025 at 10:34 AM
“Developmentally plausible pretraining, now also auf Deutsch: a BabyLM Dataset for German” — Today I had the pleasure to present our German BabyLM dataset together with the first author Bastian Bunzeck @bbunzeck.bsky.social to an interested and engaging audience at #KONVENS2025 in Hildesheim.
From conference to conference — after last week’s #semdial I am at #konvens in Hildesheim this week. I will be presenting out German BabyLM Corpus (with @simphon.bsky.social) and our PI Sina Zarrieß will give a Keynote on BabyLMs tomorrow. 🥳
September 10, 2025 at 11:08 AM
From conference to conference — after last week’s #semdial I am at #konvens in Hildesheim this week. I will be presenting out German BabyLM Corpus (with @simphon.bsky.social) and our PI Sina Zarrieß will give a Keynote on BabyLMs tomorrow. 🥳