Bastian Bunzeck
banner
bbunzeck.bsky.social
Bastian Bunzeck
@bbunzeck.bsky.social
Computational linguist trying to understand how humans and computers learn and use language 👶🧠🗣️🖥️💬

PhD @clausebielefeld.bsky.social, Bielefeld University
https://bbunzeck.github.io
Reposted by Bastian Bunzeck
Our panel moderated by @danaarad.bsky.social
"Evaluating Interpretability Methods: Challenges and Future Directions" just started! 🎉 Come to learn more about the MIB benchmark and hear the takes of @michaelwhanna.bsky.social, Michal Golovanevsky, Nicolò Brunello and Mingyang Wang!
November 9, 2025 at 6:55 AM
Reposted by Bastian Bunzeck
#EMNLP2026 will be in Budapest 🇭🇺 24-29/October/2026 (earlier than ever?) #EMNLP2025 #nlp #nlproc
November 7, 2025 at 9:30 AM
Reposted by Bastian Bunzeck
I'm in Suzhou to present our work on MultiBLiMP, Friday @ 11:45 in the Multilinguality session (A301)!

Come check it out if your interested in multilingual linguistic evaluation of LLMs (there will be parse trees on the slides! There's still use for syntactic structure!)

arxiv.org/abs/2504.02768
November 6, 2025 at 7:08 AM
Reposted by Bastian Bunzeck
One of the great mysteries of #language is how it finds a balance between robust stability and endless flexibility. I believe this requires us to rethink #linguistic structures. In this article, I propose dynamic #tensegrity as a novel architectural metaphor
aclanthology.org/2025.cxgsnlp...
aclanthology.org
November 4, 2025 at 2:08 PM
As part of this year's BabyLM challenge, we (researchers from @gronlp.bsky.social and @clausebielefeld.bsky.social diverged from established pretraining paradigm by training only on dialogue data from CHILDES.
October 28, 2025 at 12:53 PM
Reposted by Bastian Bunzeck
With only a week left for #EMNLP2025, we are happy to announce all the works we 🐮 will present 🥳 - come and say "hi" to our posters and presentations during the Main and the co-located events (*SEM and workshops) See you in Suzhou ✈️
October 27, 2025 at 11:54 AM
Reposted by Bastian Bunzeck
"The capacity for language exists along a continuum [...]. The idea that language development does not require uniquely human properties becomes increasingly important as legal boundaries expand to include nonhuman species."
October 23, 2025 at 8:49 PM
Reposted by Bastian Bunzeck
🌍Introducing BabyBabelLM: A Multilingual Benchmark of Developmentally Plausible Training Data!

LLMs learn from vastly more data than humans ever experience. BabyLM challenges this paradigm by focusing on developmentally plausible data

We extend this effort to 45 new languages!
October 15, 2025 at 10:53 AM
Preprint alert! We release BabyBabelLM, a multilingual benchmark of developmentally plausible training data. I was responsible for German and Polish data as well as various child-directed wikis. Immensely rewarding project with exceptionally cool co-authors. 🥳🚀
𝐃𝐨 𝐲𝐨𝐮 𝐫𝐞𝐚𝐥𝐥𝐲 𝐰𝐚𝐧𝐭 𝐭𝐨 𝐬𝐞𝐞 𝐰𝐡𝐚𝐭 𝐦𝐮𝐥𝐭𝐢𝐥𝐢𝐧𝐠𝐮𝐚𝐥 𝐞𝐟𝐟𝐨𝐫𝐭 𝐥𝐨𝐨𝐤𝐬 𝐥𝐢𝐤𝐞? 🇨🇳🇮🇩🇸🇪

Here’s the proof! 𝐁𝐚𝐛𝐲𝐁𝐚𝐛𝐞𝐥𝐋𝐌 is the first Multilingual Benchmark of Developmentally Plausible Training Data available for 45 languages to the NLP community 🎉

arxiv.org/abs/2510.10159
October 14, 2025 at 5:19 PM
Reposted by Bastian Bunzeck
Keynote at #COLM2025: Nicholas Carlini from Anthropic

"Are language models worth it?"

Explains that the prior decade of his work on adversarial images, while it taught us a lot, isn't very applied; it's unlikely anyone is actually altering images of cats in scary ways.
October 9, 2025 at 1:12 PM
Reposted by Bastian Bunzeck
i wrote a custom llm sampler for llama-3.1-8b so it could only say words that are in the bible
October 7, 2025 at 4:35 AM
Reposted by Bastian Bunzeck
Huge congrats to the envisionBOX team for the Open Science award nomination! 🎉

My tutorial on speech analysis tools in Python from the Unboxing Multimodality summer school (github.com/mdhk/unboxin...) is now also available at envisionbox.org

Thanks for the invitation & this great initiative! 👏
October 2, 2025 at 5:18 PM
Reposted by Bastian Bunzeck
Gentle reminder that the #CfP for #Evolang2026 @evolangconf.bsky.social is still open - deadline October 26! sites.google.com/york.ac.uk/e...
EVOLANG 2026 - Call for Papers
sites.google.com
October 2, 2025 at 11:32 AM
Reposted by Bastian Bunzeck
What's the right unit of analysis for understanding LLM internals? We explore in our mech interp survey (a major update from our 2024 ms).

We’ve added more recent work and more immediately actionable directions for future work. Now published in Computational Linguistics!
October 1, 2025 at 2:03 PM
Reposted by Bastian Bunzeck
New paper! 🚨 I argue that LLMs represent a synthesis between distributed and symbolic approaches to language, because, when exposed to language, they develop highly symbolic representations and processing mechanisms in addition to distributed ones.
arxiv.org/abs/2502.11856
September 30, 2025 at 1:16 PM
Reposted by Bastian Bunzeck
Many AI researchers draw inspiration from neuroscience. Naomi Saphra favors a different analogy. Interpretability, in her view, should take a cue from evolutionary biology.
To Understand AI, Watch How It Evolves | Quanta Magazine
Naomi Saphra thinks that most research into language models focuses too much on the finished product. She’s mining the history of their training for insights into why these systems work the way they…
www.quantamagazine.org
September 29, 2025 at 8:04 PM
My very first book review is out now 📚
Muchas gracias to @stefanhartmann.bsky.social for inviting me, looking forward to our next project(s) 😇
September 26, 2025 at 9:43 AM
Reposted by Bastian Bunzeck
I'm conducting research on how ACL's peer review policies impact NLP research quality, career trajectories, and inclusivity within our community. I am running a survey, which would take around 7-10 mins to complete: forms.cloud.microsoft/e/j2jr9nH3X0

I would really appreciate insights from y'all!
September 25, 2025 at 2:23 PM
Reposted by Bastian Bunzeck
🚨 Are you looking for a PhD in #NLProc dealing with #LLMs?
🎉 Good news: I am hiring! 🎉
The position is part of the “Contested Climate Futures" project. 🌱🌍 You will focus on developing next-generation AI methods🤖 to analyze climate-related concepts in content—including texts, images, and videos.
September 24, 2025 at 7:34 AM
Reposted by Bastian Bunzeck
Attending the The Second International Workshop on Construction Grammars and NLP (CxGs+NLP 2025) in Düsseldorf, Germany? Check out the poster “Do Construction Distributions Shape Formal Language Learning In German BabyLMs?” by Bastian Bunzeck and colleagues! @bbunzeck.bsky.social #CRC1646 #LINCC
September 23, 2025 at 10:16 AM
From conference to conference: September ends with a trip to #IWCS in beautiful Düsseldorf. Hyped for two days of semantics (and two more days of construction grammar and NLP). 🥳
September 22, 2025 at 7:51 AM
Reposted by Bastian Bunzeck
The first of the three corpora of German-English bilingual children's early speech that we've been working on for the last few years is finally publicly available! 🥳 🎉 talkbank.org/childes/acce...
CHILDES English-German MPI-EVA-Leipzig Corpus
talkbank.org
September 19, 2025 at 5:48 AM
Reposted by Bastian Bunzeck
“Developmentally plausible pretraining, now also auf Deutsch: a BabyLM Dataset for German” — Today I had the pleasure to present our German BabyLM dataset together with the first author Bastian Bunzeck @bbunzeck.bsky.social‬ to an interested and engaging audience at #KONVENS2025 in Hildesheim.
September 12, 2025 at 10:34 AM
Our BabyLMs at #konvens 🥳
Happening now: Sina‘s keynote on our BabyLM work. 🥳
September 11, 2025 at 11:34 AM
From conference to conference — after last week’s #semdial I am at #konvens in Hildesheim this week. I will be presenting out German BabyLM Corpus (with @simphon.bsky.social) and our PI Sina Zarrieß will give a Keynote on BabyLMs tomorrow. 🥳
September 10, 2025 at 11:08 AM