Lightnews — Scholar-powered news

Jaap Jumelet

@jumelet.bsky.social

720 followers 280 following 38 posts

Postdoc @rug.nl with Arianna Bisazza.

Interested in NLP, interpretability, syntax, language acquisition and typology.

Posts Replies Media Videos

Jaap Jumelet

@jumelet.bsky.social

For more information check out the website, paper, and datasets:

Website: babylm.github.io/babybabellm/
Paper: arxiv.org/pdf/2510.10159

We hope BabyBabelLM will continue as a 'living resource', fostering both more efficient NLP methods, and opening ways for cross-lingual computational linguistics!

BabyBabelLM

babylm.github.io

October 15, 2025 at 10:53 AM

Jaap Jumelet

@jumelet.bsky.social

Next to our training resources, we also release an evaluation pipeline that assess different aspects of language learning.

We present results for various simple baseline models, but hope this can serve as a starting point for a multilingual BabyLM challenge in future years!

October 15, 2025 at 10:53 AM

Jaap Jumelet

@jumelet.bsky.social

To deal with data imbalances, we divide languages into three Tiers. This better enables cross-lingual studies and makes it possible for low-resource languages to be a part of BabyBabelLM as well.

October 15, 2025 at 10:53 AM

Jaap Jumelet

@jumelet.bsky.social

With a fantastic team of international collaborators we have developed a pipeline for creating LM training data from resources that children are exposed to.

We release this pipeline and welcome new contributions!

Website: babylm.github.io/babybabellm/
Paper: arxiv.org/pdf/2510.10159

October 15, 2025 at 10:53 AM

Jaap Jumelet

@jumelet.bsky.social

Wij speelden als kind (in Breda) vaak "1 keer tets", waar je een voetbal maximaal 1 keer mocht laten stuiteren; ik had ook geen idee dat dat een Brabants woord was.

September 1, 2025 at 2:33 PM

Jaap Jumelet

@jumelet.bsky.social

Congrats and good luck in Canada!

July 1, 2025 at 11:05 PM

Jaap Jumelet

@jumelet.bsky.social

Ohh cool! Nice to see the interactions-as-structure idea I had back in 2021 is still being explored!

June 12, 2025 at 10:37 PM

Jaap Jumelet

@jumelet.bsky.social

Scherp geschreven en geheel mee eens, maar beetje wrang wel dat de boodschap zich achter een paywall van 450 euro bevindt :') (dank voor de screenshots!)

April 23, 2025 at 11:40 AM

Jaap Jumelet

@jumelet.bsky.social

That is definitely possible indeed, and a potential confounding factor. In RuBLiMP, a Russian benchmark, they defined a way to validate this based on LM probs, but we left that open for future work. The poor performance on low-res langs shows they're definitely not trained on all of UD though!

April 17, 2025 at 7:03 PM

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news