Lightnews — Scholar-powered news

Nikitas Theodoropoulos

@nikitas-theo.bsky.social

18 followers 60 following 2 posts

You can learn more about me here: https://nikitas-theo.github.io/

Posts Replies Media Videos

Nikitas Theodoropoulos

@nikitas-theo.bsky.social

Very happy to release BabyBabelLM to the world: A multilingual benchmark of developmentally plausible pretraining data! Grateful to be part of this amazing team of international researchers. 🎉 🤗
We also welcome (and support) contributions for new languages and data!

Jaap Jumelet @jumelet.bsky.social · Oct 15

🌍Introducing BabyBabelLM: A Multilingual Benchmark of Developmentally Plausible Training Data!

LLMs learn from vastly more data than humans ever experience. BabyLM challenges this paradigm by focusing on developmentally plausible data

We extend this effort to 45 new languages!

October 15, 2025 at 1:18 PM

Reposted by Nikitas Theodoropoulos

Bastian Bunzeck

@bbunzeck.bsky.social

Preprint alert! We release BabyBabelLM, a multilingual benchmark of developmentally plausible training data. I was responsible for German and Polish data as well as various child-directed wikis. Immensely rewarding project with exceptionally cool co-authors. 🥳🚀

Francesca Padovani @frap98.bsky.social · Oct 14

𝐃𝐨 𝐲𝐨𝐮 𝐫𝐞𝐚𝐥𝐥𝐲 𝐰𝐚𝐧𝐭 𝐭𝐨 𝐬𝐞𝐞 𝐰𝐡𝐚𝐭 𝐦𝐮𝐥𝐭𝐢𝐥𝐢𝐧𝐠𝐮𝐚𝐥 𝐞𝐟𝐟𝐨𝐫𝐭 𝐥𝐨𝐨𝐤𝐬 𝐥𝐢𝐤𝐞? 🇨🇳🇮🇩🇸🇪

Here’s the proof! 𝐁𝐚𝐛𝐲𝐁𝐚𝐛𝐞𝐥𝐋𝐌 is the first Multilingual Benchmark of Developmentally Plausible Training Data available for 45 languages to the NLP community 🎉

arxiv.org/abs/2510.10159

October 14, 2025 at 5:19 PM

Reposted by Nikitas Theodoropoulos

Francesca Padovani

@frap98.bsky.social

October 14, 2025 at 5:01 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news