Lightnews — Scholar-powered news

Leshem (Legend) Choshen @EMNLP

@lchoshen.bsky.social

The Golden pacifiers are ready
See you soon in BabyLM (emnlp)

November 1, 2025 at 3:43 AM

Reposted by Leshem (Legend) Choshen @EMNLP

Multilingual Representation Workshop @ EMNLP 2025

@mrl-workshop.bsky.social

Introducing Global PIQA, a new multilingual benchmark for 100+ languages. This benchmark is the outcome of this year’s MRL shared task, in collaboration with 300+ researchers from 65 countries. This dataset evaluates physical commonsense reasoning in culturally relevant contexts.

October 29, 2025 at 3:50 PM

Reposted by Leshem (Legend) Choshen @EMNLP

Jaap Jumelet

@jumelet.bsky.social

🌍Introducing BabyBabelLM: A Multilingual Benchmark of Developmentally Plausible Training Data!

LLMs learn from vastly more data than humans ever experience. BabyLM challenges this paradigm by focusing on developmentally plausible data

We extend this effort to 45 new languages!

October 15, 2025 at 10:53 AM

Reposted by Leshem (Legend) Choshen @EMNLP

Jaap Jumelet

@jumelet.bsky.social

With a fantastic team of international collaborators we have developed a pipeline for creating LM training data from resources that children are exposed to.

We release this pipeline and welcome new contributions!

Website: babylm.github.io/babybabellm/
Paper: arxiv.org/pdf/2510.10159

October 15, 2025 at 10:53 AM

Reposted by Leshem (Legend) Choshen @EMNLP

Francesca Padovani

@frap98.bsky.social

𝐃𝐨 𝐲𝐨𝐮 𝐫𝐞𝐚𝐥𝐥𝐲 𝐰𝐚𝐧𝐭 𝐭𝐨 𝐬𝐞𝐞 𝐰𝐡𝐚𝐭 𝐦𝐮𝐥𝐭𝐢𝐥𝐢𝐧𝐠𝐮𝐚𝐥 𝐞𝐟𝐟𝐨𝐫𝐭 𝐥𝐨𝐨𝐤𝐬 𝐥𝐢𝐤𝐞? 🇨🇳🇮🇩🇸🇪

Here’s the proof! 𝐁𝐚𝐛𝐲𝐁𝐚𝐛𝐞𝐥𝐋𝐌 is the first Multilingual Benchmark of Developmentally Plausible Training Data available for 45 languages to the NLP community 🎉

arxiv.org/abs/2510.10159

October 14, 2025 at 5:01 PM

Reposted by Leshem (Legend) Choshen @EMNLP

INTERPLAY Workshop@COLM '25

@interplay-workshop.bsky.social

✨ The schedule for our INTERPLAY workshop at COLM is live! ✨
🗓️ October 10th, Room 518C
🔹 Invited talks from @sarah-nlp.bsky.social John Hewitt @amuuueller.bsky.social @kmahowald.bsky.social
🔹 Paper presentations and posters
🔹 Closing roundtable discussion.

Join us in Montréal! @colmweb.org

Schedule for the INTERPLAY workshop at COLM on October 10th, Room 518C.

09:00 am: Opening
09:10 am: Invited Talks by Sarah Wiegreffe and John Hewitt
10:20 am: Paper Presentations

Lunch Break

01:00 pm: Invited Talks by Aaron Mueller and Kyle Mowhald
02:10 pm: Poster Session
03:20 pm: Roundtable Discussion
04:50 pm: Closing

October 9, 2025 at 5:30 PM

Leshem (Legend) Choshen @EMNLP

@lchoshen.bsky.social

LLM, VLMs, ... can compress data
3x over JPEG\PNG etc.
6x Zlib, gzip etc.
How?
We all know they provide a probability over data, which is all classical compression needs
(arithmetic coding, see below)
Understanding is compressing, but this time not by the weights themselves
🤖📈🧠
#AI #compress #data

October 6, 2025 at 4:47 PM

Reposted by Leshem (Legend) Choshen @EMNLP

Tiago Pimentel

@tpimentel.bsky.social

LLMs are trained to mimic a “true” distribution—their reducing cross-entropy then confirms they get closer to this target while training. Do similar models approach this target distribution in similar ways, though? 🤔 Not really! Our new paper studies this, finding 4-convergence phases in training 🧵

Figure showing the four phases of convergence in LM training

October 1, 2025 at 6:08 PM

Leshem (Legend) Choshen @EMNLP

@lchoshen.bsky.social

Employing mechanistic interpretability to study how models learn, not just where they end up
2 papers find:
There are phase transitions where features emerge and stay throughout learning
🤖📈🧠
alphaxiv.org/pdf/2509.17196
@amuuueller.bsky.social @abosselut.bsky.social
alphaxiv.org/abs/2509.05291

September 26, 2025 at 3:27 PM

Leshem (Legend) Choshen @EMNLP

@lchoshen.bsky.social

Helpfulness is what we are after, and we test it by asking humans for preferences, or reward models.
and they fail😆

They show that humans are bad at predicting what is helpful, so are reward models (all close to chance).
Reward models don't even predict what helps LLMs
RL🤔
🤖📈🧠
#AI #LLM

September 24, 2025 at 6:08 PM

Leshem (Legend) Choshen @EMNLP

@lchoshen.bsky.social

Good luck with the
@iclr_conf
writing
Know anyone who needs tips?
Want a graph checklist?
Know any good tips you wanna add?

The writing guide:
docs.google.com/document/d/1...

September 17, 2025 at 5:43 PM

Leshem (Legend) Choshen @EMNLP

@lchoshen.bsky.social

The most expensive part of training is the data, not the compute
Nikhil Kandpal & Colin Raffel calculate a really low bar for how much it would cost to produce LLM training data with 3.8$\h
Well, several scales more than the compute.
Luckily (?), companies don't pay for the data
🤖📈🧠

September 12, 2025 at 2:20 PM

Leshem (Legend) Choshen @EMNLP

@lchoshen.bsky.social

Anyone willing to make an emergency review for babyLM?

September 6, 2025 at 12:25 PM

Leshem (Legend) Choshen @EMNLP

@lchoshen.bsky.social

A dataset of ancient Chinese writings to study with LLMs, including 170K sentences for pretraining
With 10K words, mapping to modern word (when applicable)
There are so many fascinating questions out there
www.arxiv.org/abs/2508.15791

August 25, 2025 at 8:09 PM

Leshem (Legend) Choshen @EMNLP

@lchoshen.bsky.social

Why is it hard to predict downstream scores from pretraining?
For many reasons such as domain, mismatch between current abilities and what post training unfolds, "emergence" etc.
A big factor is that next token prediction != choice comparison != accuracy
www.alphaxiv.org/abs/2406.04391

August 14, 2025 at 8:15 PM

Leshem (Legend) Choshen @EMNLP

@lchoshen.bsky.social

The right path is unclear; maybe it doesn't even exist.
Still, LLMs appear to consistently follow the values of secular\rational people who strive for self-expression (sounds like me😅)

To show it they collect and release
200K human-model chats+feedback, 5 languages and 21 LLMs
🤖📈🧠

August 11, 2025 at 11:55 AM

Leshem (Legend) Choshen @EMNLP

@lchoshen.bsky.social

Pressing matters in evaluation from GEMs talk
Remember, exciting questions drive science, exciting answers follow.
Setting the right goal may make all their sota chasing worthwhile.
Make an insightful dataset, lead by evaluation
🤖📈🧠

August 5, 2025 at 11:40 AM

Reposted by Leshem (Legend) Choshen @EMNLP

Yoav Artzi

@yoavartzi.com

The talk for our work on Retrospective Learning from Interactions, which will be in ACL (once I figure out how to squeeze it shorter)

Gist: autonomous post-training from conversational signals for LLM bootstrapping ... look ma, no annotations! no hand-holding! 🙌📈🚀

www.youtube.com/watch?v=qW8S...

Retrospective Learning from Interactions

YouTube video by Yoav Artzi

www.youtube.com

July 25, 2025 at 2:15 PM

Leshem (Legend) Choshen @EMNLP

@lchoshen.bsky.social

ICML proposes we use sophisticated jailbreaks in our papers?
Ones that trick the reviewers but do not raise our scores?
Proposals?

July 24, 2025 at 3:00 PM

Leshem (Legend) Choshen @EMNLP

@lchoshen.bsky.social

Can LLMs learn social skills by playing games?
A blogpost on human-model interaction, games, training and testing LLMs
research.ibm.com/blog/LLM-soc...
🤖📈🧠

Can LLMs learn social skills by playing games?

A new open-source framework, TextArena, pits large language models against each other in competitive environments designed to test and improve their communication skills.

research.ibm.com

July 24, 2025 at 2:16 PM

Leshem (Legend) Choshen @EMNLP

@lchoshen.bsky.social

We don't understand loss spikes, that's clear.
We've learned recently that data deterministically makes spikes regardless of optimizer.
What did we see when we stopped pretraining, and then continue?
A huge spike and never a recovery? Why?
Apparently the momentum matters, a lot.
🤖📈🧠

July 22, 2025 at 11:58 AM

Leshem (Legend) Choshen @EMNLP

@lchoshen.bsky.social

Eu calls for evaluator experts to counsel about how to evaluate models (can't belong to a training company, smart)
digital
digital-strategy.ec.europa.eu/en/news/comm...
#safety #eticalAI #AI
🤖📈🧠

Tag people who should know or should pass the knowledge

Commission seeks experts for AI Scientific Panel

The European Commission is setting up a scientific panel of independent experts to support the implementation and enforcement of the AI Act.

digital-strategy.ec.europa.eu

July 19, 2025 at 5:39 PM

Leshem (Legend) Choshen @EMNLP

@lchoshen.bsky.social

The ML\LLM AI world is overcrowded✅ and it's hard to find majestic new projects❌
Here is (updating) list of thoughts, premature ideas, and research directions that I cannot pursue alone, but I believe can make radical changes
Please fight me, discuss, and ask
Disrrruption, aye?🏴‍☠️
🤖📈🧠

July 19, 2025 at 2:27 PM

Leshem (Legend) Choshen @EMNLP

@lchoshen.bsky.social

Surprisingly good tokenization workshop, resurfaced thoughts:🧠📈🤖
Why isn't tokenization learned? Can we do an evolutionary algorithm, or train a tokenization scheme on the pretrained or meta learn on something fast-pretrain (loss at the beginning of training?)
Let's discuss👇

July 19, 2025 at 2:15 PM

Leshem (Legend) Choshen @EMNLP

@lchoshen.bsky.social

Do models really understand humans (ToM)?
Can your model do it?
Beat them at the neurips Competition.
🤖📈🧠

Chiara Thöni @racemuis.bsky.social · Jul 18

Excited to announce the Mindgame @neuripsconf.bsky.social Competition is officially LIVE!
🤖 Pit your agents against others in Mafia, Codename, Prisoner’s Dilemma, Stg Hunt, and Colonel Blotto.
Sign up now for $500 in compute credits on your initial run!
🔗 Register : mindgamesarena.com

MindGames Arena Hub - NeurIPS 2025

Theory-of-Mind Challenges for LLM Agents - Four strategic games testing AI collaboration and competition

mindgamesarena.com

July 19, 2025 at 1:22 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news