Marianne de Heer Kloots
mdhk.net
Marianne de Heer Kloots
@mdhk.net
Linguist in AI & CogSci 🧠👩‍💻🤖 PhD student @ ILLC, University of Amsterdam

🌐 https://mdhk.net/
🐘 https://scholar.social/@mdhk
🐦 https://twitter.com/mariannedhk
Thanks to all co-authors in the Dutch SSL training team @hmohebbi.bsky.social @cpouw.bsky.social @gaofeishen.com @wzuidema.bsky.social + Martijn Bentum

And to @itcooperativesurf.bsky.social (EINF-8324) for granting me the resources that enabled this project 👩‍💻✨
August 27, 2025 at 2:31 PM
Check out the paper for more details:
📄 arxiv.org/abs/2506.00981

Or the model, dataset and code released alongside it:
🤗 huggingface.co/amsterdamNLP...
🗃️ zenodo.org/records/1554...
🔍 github.com/mdhk/SSL-NL-...

We hope these resources help further research on language-specificity in speech models!
August 27, 2025 at 2:31 PM
Finally, downstream performance on Dutch speech-to-text transcription reflects the language-specific advantage for Dutch linguistic feature encoding in model-internal representations: on average, Wav2Vec2-NL has a 27% lower word error rate than the multilingual model.
August 27, 2025 at 2:31 PM
Furthermore, Wav2Vec2-NL shows a stronger advantage on dialogue (IFADV) than on audiobook (MLS) data.
➡️ Training on conversational speech is important not only for enhancing the representation of conversation-level structures, but also for the encoding of smaller linguistic units (phones & words).
August 27, 2025 at 2:31 PM
But there are also interesting differences between methods: for example, trained probes show stronger language-specific advantages for phonetic encoding than zero-shot metrics.

➡️ Language-specific phonetic information may only take up a relatively small subspace of model-internal representations.
August 27, 2025 at 2:31 PM
We find that language-specific advantages are well-detected by trained clustering or classification probes, and partially observable using zero-shot metrics. I.e. the encoding of Dutch linguistic features is enhanced in the Dutch model, as compared to models trained on English and multilingual data.
August 27, 2025 at 2:31 PM
But they also used different analysis techniques.

We designed the SSL-NL dataset to test the encoding of Dutch phonetic and lexical features in SSL speech representations, while allowing for comparisons across different analysis methods.

We compare both trained probes(*) and zero-shot metrics:
August 27, 2025 at 2:31 PM
Wav2Vec2-NL is trained exclusively (from scratch) on 831 hours of Dutch speech recordings. So does this help the model to encode Dutch-specific phonetic and lexical information?

Previous studies analyzing language-specific representations in speech SSL models have reported mixed results.
August 27, 2025 at 2:31 PM
We also share a working bibliography of recent publications reporting speech model interpretability analyses, that we've compiled while surveying the literature. It is incomplete and we would love your input! github.com/mdhk/awesome...
August 20, 2025 at 5:09 AM
The materials include slides and notebooks by @grzegorz.chrupala.me, Martijn Bentum, @cpouw.bsky.social, @hmohebbi.bsky.social, @gaofeishen.com, @wzuidema.bsky.social & me.
Find an overview here: interpretingdl.github.io/speech-inter...
August 19, 2025 at 9:23 PM
Last but not least, I personally can’t wait for the social event on Thursday night that we’ve been planning for the past year ✨
It features a *live brain-controlled music act* by the AIAR collective 🧠🎶 2025.ccneuro.org/social-event/ Get one of the last remaining tickets at the registration desk now!
August 12, 2025 at 2:19 PM
Raquel Fernández will present our joint project with @annabavaresco.bsky.social and Sandro Pezzelle: Modelling Multimodal Integration in Human Concept Processing with Vision-Language Models (poster B32)
🔗 2025.ccneuro.org/poster/?id=D...
August 12, 2025 at 2:19 PM