Leshem (Legend) Choshen @EMNLP
banner
lchoshen.bsky.social
Leshem (Legend) Choshen @EMNLP
@lchoshen.bsky.social
🥇 LLMs together (co-created model merging, BabyLM, textArena.ai)
🥈 Spreading science over hype in #ML & #NLP
Proud shareLM💬 Donor

@IBMResearch & @MIT_CSAIL
I just saw an LLM making a fluency mistake. How can that happen?! Something about long context?
January 23, 2026 at 9:20 PM
"Kill your darlings, cut unless serving the story"
One tip from the guide.

If you needed a moment out of #ICML2026 writing and graphs, why not read some writing and figure making tips?
docs.google.com/document/d/1...
Ever Growing Academic Writing
#################### Call for Collaboration ################## This aims to help academic writers If you have any additions or corrections, please add them or comment #################################...
docs.google.com
January 15, 2026 at 7:43 PM
"It is time to separate language from language models"
The revelation keeps bugging me, so it also underlies the talk I just gave "multilingual?"
Thought I'd briefly share the contents of the talk:
🤖📈🧠
January 14, 2026 at 4:52 PM
From the conference that brought you the AC instead of rebuttal:
Desk rejecting months after reviews were given.

@iclr-conf.bsky.social - initiatives are good, but...
January 1, 2026 at 6:52 AM
Reposted by Leshem (Legend) Choshen @EMNLP
It'd be interesting to study how the substance of what functioning robots learn differs from LLMs. LLMs are based on language which in a sense is at least one level of abstraction up from experiential reality; language is humans' expression (a compression) of their collective experiences.
On the unexplained similarity across networks

In behavior, order and weights, we keep seeing evidence that learning is more consistent than one might think.

A walk through occurrences, my thoughts and the open question, why?!
What's your hypothesis, missed papers and thoughts
🤖📈🧠 #AI
December 21, 2025 at 4:01 PM
On the unexplained similarity across networks

In behavior, order and weights, we keep seeing evidence that learning is more consistent than one might think.

A walk through occurrences, my thoughts and the open question, why?!
What's your hypothesis, missed papers and thoughts
🤖📈🧠 #AI
December 21, 2025 at 10:46 AM
“Today, I have a vision, a vision of superintelligence from experience”

Presented in his humble way, Rich Sutton shares his vision of what AI needs
General, experiential, discovers its own abstractions and not bitter🤢
#NeurIPS2025 #NeurIPS
🤖📈🧠
December 3, 2025 at 5:37 PM
Reposted by Leshem (Legend) Choshen @EMNLP
LLMs do not learn from experience
LLMs do not learn from explicit corrections
LLMs do not learn from being told the answer
LLMs do not learn from being shown how to solve it
We study Machine Learning, these are opportunities!
A gold mine of research.
December 2, 2025 at 11:22 PM
"Hey dude, look. What is this button doing?"
⚡️BzZzZz⚡️
"Hey dude,..."
Would you press the button again?
Would an LLM?

Evolving LLMs, diverse open LLMs, and their evaluation are on my mind.
Before I share more, I encourage you to say hi here or in #NeurIPS 🤖📈🧠
December 2, 2025 at 11:22 PM
Join us at NeurIPS 2025 for the MindGames Challenge Workshop!
Explore theory of mind, game intelligence, and multi-agent LLMs in interactive game environments.
🗓 Sunday, December 7
⏰ 8:00–10:45 AM
📍 San Diego Convention Center, Ballroom 6CF
🤖📈🧠
November 29, 2025 at 4:14 PM
Quietly quietly Yuntian Deng released 2 million new chats of real humans with models (non-toxic),
Kudo.
There are now datasets of
over 4.5m chats open for research and all
in the same format (shareLM)!
huggingface.co/datasets/sha...
t\h
@msheshera.bsky.social
shachardon/ShareLM · Datasets at Hugging Face
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
huggingface.co
November 19, 2025 at 8:29 PM
001-110 101-?
AAT-TTA TAT-?
In context learning emerge outside language wonderful finding
For years since the GPT-2 paper, emergent in-context learning (ICL) from 'next-token' training has been treated as something deeply tied to 𝐡𝐮𝐦𝐚𝐧 𝐥𝐚𝐧𝐠𝐮𝐚𝐠𝐞. But … is it?
November 19, 2025 at 8:28 PM
The Golden pacifiers are ready
See you soon in BabyLM (emnlp)
November 1, 2025 at 3:43 AM
Reposted by Leshem (Legend) Choshen @EMNLP
Introducing Global PIQA, a new multilingual benchmark for 100+ languages. This benchmark is the outcome of this year’s MRL shared task, in collaboration with 300+ researchers from 65 countries. This dataset evaluates physical commonsense reasoning in culturally relevant contexts.
October 29, 2025 at 3:50 PM
Reposted by Leshem (Legend) Choshen @EMNLP
🌍Introducing BabyBabelLM: A Multilingual Benchmark of Developmentally Plausible Training Data!

LLMs learn from vastly more data than humans ever experience. BabyLM challenges this paradigm by focusing on developmentally plausible data

We extend this effort to 45 new languages!
October 15, 2025 at 10:53 AM
Reposted by Leshem (Legend) Choshen @EMNLP
With a fantastic team of international collaborators we have developed a pipeline for creating LM training data from resources that children are exposed to.

We release this pipeline and welcome new contributions!

Website: babylm.github.io/babybabellm/
Paper: arxiv.org/pdf/2510.10159
October 15, 2025 at 10:53 AM
Reposted by Leshem (Legend) Choshen @EMNLP
𝐃𝐨 𝐲𝐨𝐮 𝐫𝐞𝐚𝐥𝐥𝐲 𝐰𝐚𝐧𝐭 𝐭𝐨 𝐬𝐞𝐞 𝐰𝐡𝐚𝐭 𝐦𝐮𝐥𝐭𝐢𝐥𝐢𝐧𝐠𝐮𝐚𝐥 𝐞𝐟𝐟𝐨𝐫𝐭 𝐥𝐨𝐨𝐤𝐬 𝐥𝐢𝐤𝐞? 🇨🇳🇮🇩🇸🇪

Here’s the proof! 𝐁𝐚𝐛𝐲𝐁𝐚𝐛𝐞𝐥𝐋𝐌 is the first Multilingual Benchmark of Developmentally Plausible Training Data available for 45 languages to the NLP community 🎉

arxiv.org/abs/2510.10159
October 14, 2025 at 5:01 PM
Reposted by Leshem (Legend) Choshen @EMNLP
✨ The schedule for our INTERPLAY workshop at COLM is live! ✨
🗓️ October 10th, Room 518C
🔹 Invited talks from @sarah-nlp.bsky.social John Hewitt @amuuueller.bsky.social @kmahowald.bsky.social
🔹 Paper presentations and posters
🔹 Closing roundtable discussion.

Join us in Montréal! @colmweb.org
October 9, 2025 at 5:30 PM
LLM, VLMs, ... can compress data
3x over JPEG\PNG etc.
6x Zlib, gzip etc.
How?
We all know they provide a probability over data, which is all classical compression needs
(arithmetic coding, see below)
Understanding is compressing, but this time not by the weights themselves
🤖📈🧠
#AI #compress #data
October 6, 2025 at 4:47 PM
Reposted by Leshem (Legend) Choshen @EMNLP
LLMs are trained to mimic a “true” distribution—their reducing cross-entropy then confirms they get closer to this target while training. Do similar models approach this target distribution in similar ways, though? 🤔 Not really! Our new paper studies this, finding 4-convergence phases in training 🧵
October 1, 2025 at 6:08 PM
Employing mechanistic interpretability to study how models learn, not just where they end up
2 papers find:
There are phase transitions where features emerge and stay throughout learning
🤖📈🧠
alphaxiv.org/pdf/2509.17196
@amuuueller.bsky.social @abosselut.bsky.social
alphaxiv.org/abs/2509.05291
September 26, 2025 at 3:27 PM
Helpfulness is what we are after, and we test it by asking humans for preferences, or reward models.
and they fail😆

They show that humans are bad at predicting what is helpful, so are reward models (all close to chance).
Reward models don't even predict what helps LLMs
RL🤔
🤖📈🧠
#AI #LLM
September 24, 2025 at 6:08 PM
Good luck with the
@iclr_conf
writing
Know anyone who needs tips?
Want a graph checklist?
Know any good tips you wanna add?

The writing guide:
docs.google.com/document/d/1...
September 17, 2025 at 5:43 PM
The most expensive part of training is the data, not the compute
Nikhil Kandpal & Colin Raffel calculate a really low bar for how much it would cost to produce LLM training data with 3.8$\h
Well, several scales more than the compute.
Luckily (?), companies don't pay for the data
🤖📈🧠
September 12, 2025 at 2:20 PM
Anyone willing to make an emergency review for babyLM?
September 6, 2025 at 12:25 PM