Lightnews — Scholar-powered news

Bastian Bunzeck

@bbunzeck.bsky.social

350 followers 800 following 100 posts

Computational linguist trying to understand how humans and computers learn and use language 👶🧠🗣️🖥️💬

The work is mysterious and important. See https://bbunzeck.github.io

PhDing at @clausebielefeld.bsky.social

Posts Replies Media Videos

Bastian Bunzeck

@bbunzeck.bsky.social

Also, the dialogue pairs taken from real data provide a much better reward signal than those synthetically generated. Here, real data beats synthetic data quite drastically!

October 28, 2025 at 12:56 PM

Bastian Bunzeck

@bbunzeck.bsky.social

While performance on most benchmarks also decreases, it actually increases on our own dialogue minimal pairs (real vs. randomly sampled adjacency pairs), from 64% for the pretrained model to 68% after reinforcement learning, even outperforming the BabyLM baseline by 10%.

October 28, 2025 at 12:55 PM

Bastian Bunzeck

@bbunzeck.bsky.social

We replaced the standard BabyLM corpus with 10M tokens of dialogue triplets from CHILDES and trained an autoregressive model that we call llamalogue.

Examples of dialogue triplets in llamalogue:

*CHI: all gone .
*MOT: where's the kitty ?
*CHI: all gone .

October 28, 2025 at 12:54 PM

Bastian Bunzeck

@bbunzeck.bsky.social

As part of this year's BabyLM challenge, we (researchers from @gronlp.bsky.social and @clausebielefeld.bsky.social diverged from established pretraining paradigm by training only on dialogue data from CHILDES.

Dialogue Is Not Enough to Make a Communicative BabyLM
(But Neither Is Developmentally Inspired Reinforcement Learning)
Francesca Padovani1∗ Bastian Bunzeck2∗ Manar Ali2 Omar Momen2
Arianna Bisazza1 Hendrik Buschmeier2 Sina Zarrieß2
1Center for Language and Cognition (CLCG), University of Groningen
2CRC 1646 – Linguistic Creativity in Communication, Bielefeld University
f.padovani@rug.nl bastian.bunzeck@uni-bielefeld.de

October 28, 2025 at 12:53 PM

Bastian Bunzeck

@bbunzeck.bsky.social

From conference to conference: September ends with a trip to #IWCS in beautiful Düsseldorf. Hyped for two days of semantics (and two more days of construction grammar and NLP). 🥳

September 22, 2025 at 7:51 AM

Bastian Bunzeck

@bbunzeck.bsky.social

From conference to conference — after last week’s #semdial I am at #konvens in Hildesheim this week. I will be presenting out German BabyLM Corpus (with @simphon.bsky.social) and our PI Sina Zarrieß will give a Keynote on BabyLMs tomorrow. 🥳

September 10, 2025 at 11:08 AM

Bastian Bunzeck

@bbunzeck.bsky.social

🗣️🗣️🗣️❗️❗️❗️

September 5, 2025 at 9:40 AM

Bastian Bunzeck

@bbunzeck.bsky.social

I will present a poster on the First Language article I wrote with Holger Diessel now at #semdial 😁💬

September 3, 2025 at 1:49 PM

Bastian Bunzeck

@bbunzeck.bsky.social

Now coming up: session 1 on naturalistic dialogue 👌

September 3, 2025 at 8:34 AM

Bastian Bunzeck

@bbunzeck.bsky.social

There are similarities and fundamental differences. 🤷‍♂️

September 3, 2025 at 8:10 AM

Bastian Bunzeck

@bbunzeck.bsky.social

By testing LMs and human on the same stimuli…

September 3, 2025 at 8:03 AM

Bastian Bunzeck

@bbunzeck.bsky.social

The answer is: it depends, but generally yes. 😇

September 3, 2025 at 8:00 AM

Bastian Bunzeck

@bbunzeck.bsky.social

Finally, language models also display structural priming effects.

September 3, 2025 at 7:54 AM

Bastian Bunzeck

@bbunzeck.bsky.social

What drives LLMs to generate repetitions? Nearby tokens in the context, and the previous repetition of constructions.

September 3, 2025 at 7:49 AM

Bastian Bunzeck

@bbunzeck.bsky.social

Their repetition patterns are quite similar to humans', but not consistently and as locally as humans' patterns.

September 3, 2025 at 7:46 AM

Bastian Bunzeck

@bbunzeck.bsky.social

For example in the map task:

September 3, 2025 at 7:39 AM

Bastian Bunzeck

@bbunzeck.bsky.social

The frequency of different kinds of repetition changes across development, suggesting an implicit curriculum in human language acquisition.

September 3, 2025 at 7:34 AM

Bastian Bunzeck

@bbunzeck.bsky.social

Repetitions increase in complexity as language users progress, and they are also initiated more by learners.

September 3, 2025 at 7:30 AM

Bastian Bunzeck

@bbunzeck.bsky.social

First keynote by Arabella Sinclair from the University of Aberdeen on “The many reasons for repetition in Dialogue”.

September 3, 2025 at 7:23 AM

Bastian Bunzeck

@bbunzeck.bsky.social

We are very happy to welcome so many participants from so many different places 😇

September 3, 2025 at 7:19 AM

Bastian Bunzeck

@bbunzeck.bsky.social

…which has continued to the present day! 😮‍💨

September 3, 2025 at 7:13 AM

Bastian Bunzeck

@bbunzeck.bsky.social

Bielefeld University has a long, interdisciplinary history of research on dialogue and communication…

September 3, 2025 at 7:10 AM

Bastian Bunzeck

@bbunzeck.bsky.social

After 24 years, SemDial is back in the heart of Ostwestfalen 😎

September 3, 2025 at 7:06 AM

Bastian Bunzeck

@bbunzeck.bsky.social

#semdial is about to begin 🥳

September 3, 2025 at 7:01 AM

Bastian Bunzeck

@bbunzeck.bsky.social

Super cool talk by Charlotte Pouw on intonational phrasing, pauses and lengthening in TTS systems. They struggle especially with garden-path sentences, so apparently text-to-speech is not yet solved! @conll-conf.bsky.social #ACL2025NLP

Final slide of Charlotte‘s talk with a QR code to the paper.

July 31, 2025 at 9:19 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news