Bastian Bunzeck
banner
bbunzeck.bsky.social
Bastian Bunzeck
@bbunzeck.bsky.social
Computational linguist trying to understand how humans and computers learn and use language 👶🧠🗣️🖥️💬

The work is mysterious and important. See https://bbunzeck.github.io

PhDing at @clausebielefeld.bsky.social
Also, the dialogue pairs taken from real data provide a much better reward signal than those synthetically generated. Here, real data beats synthetic data quite drastically!
October 28, 2025 at 12:56 PM
While performance on most benchmarks also decreases, it actually increases on our own dialogue minimal pairs (real vs. randomly sampled adjacency pairs), from 64% for the pretrained model to 68% after reinforcement learning, even outperforming the BabyLM baseline by 10%.
October 28, 2025 at 12:55 PM
We replaced the standard BabyLM corpus with 10M tokens of dialogue triplets from CHILDES and trained an autoregressive model that we call llamalogue.
October 28, 2025 at 12:54 PM
As part of this year's BabyLM challenge, we (researchers from @gronlp.bsky.social and @clausebielefeld.bsky.social diverged from established pretraining paradigm by training only on dialogue data from CHILDES.
October 28, 2025 at 12:53 PM
From conference to conference: September ends with a trip to #IWCS in beautiful Düsseldorf. Hyped for two days of semantics (and two more days of construction grammar and NLP). 🥳
September 22, 2025 at 7:51 AM
From conference to conference — after last week’s #semdial I am at #konvens in Hildesheim this week. I will be presenting out German BabyLM Corpus (with @simphon.bsky.social) and our PI Sina Zarrieß will give a Keynote on BabyLMs tomorrow. 🥳
September 10, 2025 at 11:08 AM
🗣️🗣️🗣️❗️❗️❗️
September 5, 2025 at 9:40 AM
I will present a poster on the First Language article I wrote with Holger Diessel now at #semdial 😁💬
September 3, 2025 at 1:49 PM
Now coming up: session 1 on naturalistic dialogue 👌
September 3, 2025 at 8:34 AM
There are similarities and fundamental differences. 🤷‍♂️
September 3, 2025 at 8:10 AM
By testing LMs and human on the same stimuli…
September 3, 2025 at 8:03 AM
The answer is: it depends, but generally yes. 😇
September 3, 2025 at 8:00 AM
Finally, language models also display structural priming effects.
September 3, 2025 at 7:54 AM
What drives LLMs to generate repetitions? Nearby tokens in the context, and the previous repetition of constructions.
September 3, 2025 at 7:49 AM
Their repetition patterns are quite similar to humans', but not consistently and as locally as humans' patterns.
September 3, 2025 at 7:46 AM
For example in the map task:
September 3, 2025 at 7:39 AM
The frequency of different kinds of repetition changes across development, suggesting an implicit curriculum in human language acquisition.
September 3, 2025 at 7:34 AM
Repetitions increase in complexity as language users progress, and they are also initiated more by learners.
September 3, 2025 at 7:30 AM
First keynote by Arabella Sinclair from the University of Aberdeen on “The many reasons for repetition in Dialogue”.
September 3, 2025 at 7:23 AM
We are very happy to welcome so many participants from so many different places 😇
September 3, 2025 at 7:19 AM
…which has continued to the present day! 😮‍💨
September 3, 2025 at 7:13 AM
Bielefeld University has a long, interdisciplinary history of research on dialogue and communication…
September 3, 2025 at 7:10 AM
After 24 years, SemDial is back in the heart of Ostwestfalen 😎
September 3, 2025 at 7:06 AM
#semdial is about to begin 🥳
September 3, 2025 at 7:01 AM
Super cool talk by Charlotte Pouw on intonational phrasing, pauses and lengthening in TTS systems. They struggle especially with garden-path sentences, so apparently text-to-speech is not yet solved! @conll-conf.bsky.social #ACL2025NLP
July 31, 2025 at 9:19 AM