🥈 Spreading science over hype in #ML & #NLP
Proud shareLM💬 Donor
@IBMResearch & @MIT_CSAIL
See you soon in BabyLM (emnlp)
See you soon in BabyLM (emnlp)
LLMs learn from vastly more data than humans ever experience. BabyLM challenges this paradigm by focusing on developmentally plausible data
We extend this effort to 45 new languages!
LLMs learn from vastly more data than humans ever experience. BabyLM challenges this paradigm by focusing on developmentally plausible data
We extend this effort to 45 new languages!
We release this pipeline and welcome new contributions!
Website: babylm.github.io/babybabellm/
Paper: arxiv.org/pdf/2510.10159
We release this pipeline and welcome new contributions!
Website: babylm.github.io/babybabellm/
Paper: arxiv.org/pdf/2510.10159
Here’s the proof! 𝐁𝐚𝐛𝐲𝐁𝐚𝐛𝐞𝐥𝐋𝐌 is the first Multilingual Benchmark of Developmentally Plausible Training Data available for 45 languages to the NLP community 🎉
arxiv.org/abs/2510.10159
Here’s the proof! 𝐁𝐚𝐛𝐲𝐁𝐚𝐛𝐞𝐥𝐋𝐌 is the first Multilingual Benchmark of Developmentally Plausible Training Data available for 45 languages to the NLP community 🎉
arxiv.org/abs/2510.10159
🗓️ October 10th, Room 518C
🔹 Invited talks from @sarah-nlp.bsky.social John Hewitt @amuuueller.bsky.social @kmahowald.bsky.social
🔹 Paper presentations and posters
🔹 Closing roundtable discussion.
Join us in Montréal! @colmweb.org
🗓️ October 10th, Room 518C
🔹 Invited talks from @sarah-nlp.bsky.social John Hewitt @amuuueller.bsky.social @kmahowald.bsky.social
🔹 Paper presentations and posters
🔹 Closing roundtable discussion.
Join us in Montréal! @colmweb.org
3x over JPEG\PNG etc.
6x Zlib, gzip etc.
How?
We all know they provide a probability over data, which is all classical compression needs
(arithmetic coding, see below)
Understanding is compressing, but this time not by the weights themselves
🤖📈🧠
#AI #compress #data
2 papers find:
There are phase transitions where features emerge and stay throughout learning
🤖📈🧠
alphaxiv.org/pdf/2509.17196
@amuuueller.bsky.social @abosselut.bsky.social
alphaxiv.org/abs/2509.05291
2 papers find:
There are phase transitions where features emerge and stay throughout learning
🤖📈🧠
alphaxiv.org/pdf/2509.17196
@amuuueller.bsky.social @abosselut.bsky.social
alphaxiv.org/abs/2509.05291
and they fail😆
They show that humans are bad at predicting what is helpful, so are reward models (all close to chance).
Reward models don't even predict what helps LLMs
RL🤔
🤖📈🧠
#AI #LLM
@iclr_conf
writing
Know anyone who needs tips?
Want a graph checklist?
Know any good tips you wanna add?
The writing guide:
docs.google.com/document/d/1...
@iclr_conf
writing
Know anyone who needs tips?
Want a graph checklist?
Know any good tips you wanna add?
The writing guide:
docs.google.com/document/d/1...
Nikhil Kandpal & Colin Raffel calculate a really low bar for how much it would cost to produce LLM training data with 3.8$\h
Well, several scales more than the compute.
Luckily (?), companies don't pay for the data
🤖📈🧠
Nikhil Kandpal & Colin Raffel calculate a really low bar for how much it would cost to produce LLM training data with 3.8$\h
Well, several scales more than the compute.
Luckily (?), companies don't pay for the data
🤖📈🧠
With 10K words, mapping to modern word (when applicable)
There are so many fascinating questions out there
www.arxiv.org/abs/2508.15791
With 10K words, mapping to modern word (when applicable)
There are so many fascinating questions out there
www.arxiv.org/abs/2508.15791
For many reasons such as domain, mismatch between current abilities and what post training unfolds, "emergence" etc.
A big factor is that next token prediction != choice comparison != accuracy
www.alphaxiv.org/abs/2406.04391
For many reasons such as domain, mismatch between current abilities and what post training unfolds, "emergence" etc.
A big factor is that next token prediction != choice comparison != accuracy
www.alphaxiv.org/abs/2406.04391
Still, LLMs appear to consistently follow the values of secular\rational people who strive for self-expression (sounds like me😅)
To show it they collect and release
200K human-model chats+feedback, 5 languages and 21 LLMs
🤖📈🧠
Still, LLMs appear to consistently follow the values of secular\rational people who strive for self-expression (sounds like me😅)
To show it they collect and release
200K human-model chats+feedback, 5 languages and 21 LLMs
🤖📈🧠
Remember, exciting questions drive science, exciting answers follow.
Setting the right goal may make all their sota chasing worthwhile.
Make an insightful dataset, lead by evaluation
🤖📈🧠
Remember, exciting questions drive science, exciting answers follow.
Setting the right goal may make all their sota chasing worthwhile.
Make an insightful dataset, lead by evaluation
🤖📈🧠
Gist: autonomous post-training from conversational signals for LLM bootstrapping ... look ma, no annotations! no hand-holding! 🙌📈🚀
www.youtube.com/watch?v=qW8S...
Gist: autonomous post-training from conversational signals for LLM bootstrapping ... look ma, no annotations! no hand-holding! 🙌📈🚀
www.youtube.com/watch?v=qW8S...
Ones that trick the reviewers but do not raise our scores?
Proposals?
Ones that trick the reviewers but do not raise our scores?
Proposals?
A blogpost on human-model interaction, games, training and testing LLMs
research.ibm.com/blog/LLM-soc...
🤖📈🧠
A blogpost on human-model interaction, games, training and testing LLMs
research.ibm.com/blog/LLM-soc...
🤖📈🧠
We've learned recently that data deterministically makes spikes regardless of optimizer.
What did we see when we stopped pretraining, and then continue?
A huge spike and never a recovery? Why?
Apparently the momentum matters, a lot.
🤖📈🧠
We've learned recently that data deterministically makes spikes regardless of optimizer.
What did we see when we stopped pretraining, and then continue?
A huge spike and never a recovery? Why?
Apparently the momentum matters, a lot.
🤖📈🧠
digital
digital-strategy.ec.europa.eu/en/news/comm...
#safety #eticalAI #AI
🤖📈🧠
Tag people who should know or should pass the knowledge
digital
digital-strategy.ec.europa.eu/en/news/comm...
#safety #eticalAI #AI
🤖📈🧠
Tag people who should know or should pass the knowledge
Here is (updating) list of thoughts, premature ideas, and research directions that I cannot pursue alone, but I believe can make radical changes
Please fight me, discuss, and ask
Disrrruption, aye?🏴☠️
🤖📈🧠
Here is (updating) list of thoughts, premature ideas, and research directions that I cannot pursue alone, but I believe can make radical changes
Please fight me, discuss, and ask
Disrrruption, aye?🏴☠️
🤖📈🧠
Why isn't tokenization learned? Can we do an evolutionary algorithm, or train a tokenization scheme on the pretrained or meta learn on something fast-pretrain (loss at the beginning of training?)
Let's discuss👇
Why isn't tokenization learned? Can we do an evolutionary algorithm, or train a tokenization scheme on the pretrained or meta learn on something fast-pretrain (loss at the beginning of training?)
Let's discuss👇
Can your model do it?
Beat them at the neurips Competition.
🤖📈🧠
🤖 Pit your agents against others in Mafia, Codename, Prisoner’s Dilemma, Stg Hunt, and Colonel Blotto.
Sign up now for $500 in compute credits on your initial run!
🔗 Register : mindgamesarena.com
Can your model do it?
Beat them at the neurips Competition.
🤖📈🧠