Dimitris Papailiopoulos
banner
dimitrisp.bsky.social
Dimitris Papailiopoulos
@dimitrisp.bsky.social
Researcher @MSFTResearch; Prof @UWMadison (on leave); learning in context; thinking about reasoning; babas of Inez Lily.
https://papail.io
What if for most of your findings you just post a thread and share a GitHub repo, rather than submitting a 15 page NeurIPS paper with < 1/100 the reach?
May 16, 2025 at 2:57 PM
LLMs learn world models, beyond a reasonable doubt. It's been the case since GPT-3, but now it should be even more clear. Without them "Guess and Check" would not work.

The fact that these "world models" are approximate/incomplete does not disqualify them.
May 12, 2025 at 6:38 PM
Is 1948 widely acknowledged as the birth of language models and tokenizers?

In "A Mathematical Theory of Communication", almost as an afterthought Shannon suggests the N-gram for generating English, and that word level tokenization is better than character level tokenization.
May 7, 2025 at 12:05 PM
Reposted by Dimitris Papailiopoulos
🎉The Phi-4 reasoning models have landed on HF and Azure AI Foundry. The new models are competitive and often outperform much larger frontier models. It is exciting to see the reasoning capabilities extend to more domains beyond math, including algorithmic reasoning, calendar planning, and coding.
May 1, 2025 at 12:50 AM
I am afraid to report, RL works.

I think 2-3 years ago, I said I will not work on two ML sub-areas. RL was one of them. I am happy to say that I am not strongly attached to my beliefs.
April 30, 2025 at 8:08 PM
Re: The Chatbot Arena Illusion

Every eval chokes under hill climbing. If we're lucky, there’s an early phase where *real* learning (both model and community) can occur. I'd argue that a benchmark’s value lies entirely in that window. So the real question is what did we learn?
April 30, 2025 at 4:38 PM
Fun trivia now that “sycophant” became common language to describe LLMs flattering users:

In Greek, συκοφάντης (sykophántēs) most typically refers to a malicious slanderer, someone spreading lies, not flattery!

Every time you use it, you’re technically using it wrong :D
April 28, 2025 at 1:58 PM
Come work with us at MSR AI Frontiers and help us figure out reasoning!
We're hiring at the Senior Researcher level (eg post phd).
Please drop me a DM if you do!
jobs.careers.microsoft.com/us/en/job/17...
Search Jobs | Microsoft Careers
jobs.careers.microsoft.com
February 21, 2025 at 3:48 PM
o3 can't multiply beyond a few digits...

But I think multiplication, addition, maze solving and easy-to-hard generalization is actually solvable on standard transformers...

with recursive self-improvement

Below is the acc of a tiny model teaching itself how to add and multiply
February 13, 2025 at 1:33 PM
Reposted by Dimitris Papailiopoulos
o3 can't multiply beyond a few digits...

But he think multiplication, addition, maze solving and easy-to-hard generalization is actually solvable on standard transformers...

with recursive self-improvement, as presented by @dimitrisp.bsky.social
February 13, 2025 at 7:57 AM
Self-improving Transformers can overcome easy-to-hard and length generalization challenges.

Paper on arxiv coming on Monday.
Link to a talk I gave on this below 👇

Super excited about this work!

Talk : youtube.com/watch?v=szhE...
slides: tinyurl.com/SelfImprovem...
February 2, 2025 at 1:23 PM
Two months before R1 came out, I wrote this in my small notebook of ideas as something to test #schmidhuber
February 1, 2025 at 6:53 PM
Now that we have reasoner LLMs, let's think about how to GRPO problem generators that generate instances that sit right outside the frontier of current capabilities.
January 29, 2025 at 10:25 PM
Reposted by Dimitris Papailiopoulos
🚀 🇬🇷 A year in the making! I’ve just completed a set of 21 lectures in Machine Learning, in Greek, designed for high school students. The course introduces key ML concepts, coding in Python & PyTorch, and real-world AI applications.
👉 WebPage: tinyurl.com/ye2awe8m
🎥 YouTube: tinyurl.com/2wwjru6z
Μηχανική Μάθηση (Machine Learning) - YouTube
Διαλέξεις Τεχνητής Νοημοσύνης και Μηχανικής Μάθησης: https://caramanis.github.io/MachineLearningClass/ Καλωσορίσατε στο μάθημα τεχνητής νοημοσύνης και μηχανι...
tinyurl.com
January 29, 2025 at 6:04 PM
If you wanted to collect 1 mil reasoning traces from human subjects on say math, that would cost ~$50m, assuming ~50$/person/hour. Interesting to compare with the cost to generate them from a reasoning LLM, with say with cost per trace ~$0.5 (say 10k tokens).. That's 100x cheaper
January 28, 2025 at 9:04 PM
Ok we've read a lot about test-time compute being the new scaling axis, but what's the next scaling axis?
January 28, 2025 at 9:04 PM
Reposted by Dimitris Papailiopoulos
2014 GoogLeNet: The best image classifier was only trainable using weeks of Google's custom infrastructure.

2018 ResNet: A more accurate model is trainable in a 1/2 hour on a single GPU.

What stops this from happening for LLMs?
Machine learning progresses when complicated breakthroughs are soon dramatically simplified as people figure out the salient parts.

What a world we're in where this well-trodden pattern rocks financial markets and escalates geopolitical conflict.
January 27, 2025 at 3:16 PM
A strong math/theory foundation can be extremely useful for ML research. Not for proving sample complexity bounds on "AGI", but for offering a mental model of inaccessible and complex systems, that can allow for accurate predictions, without running expensive experiments.
January 26, 2025 at 7:25 PM
The "deepseek distilled o1" is an intellectually vacuous discussion, precisely because what they reported in the R1 paper is a reproducible phenomenon! By now many experiments on non-deepseek models show that acc and inf-time compute increase as the result of outcome-based RL.
January 25, 2025 at 8:17 PM
GRPO and outcome based RL rely heavily on a verifier with access to ground truth data. But likely can work beyond strictly verifiable domains, as long as you have access to a "weak" grader. And perhaps even beyond that, if "correct trajectories" share a common fingerprint..
January 24, 2025 at 9:55 PM
Elated to announce that I got some papers accepted, some rejected, and some withdrawn, at some conference that I won't attend :D
January 24, 2025 at 4:09 PM
1/5 A hypothesis on the emergence of long form "yapping" in reasoning models:

The increase of "yapping" in reasoning models, as they are trained for more rounds of RL, "emerges" (sorry :D) as models discover that verbose reasoning helps them achieve better rewards (eg higher acc).
January 22, 2025 at 7:22 PM
I love finding silly tests that LLMs are terrible at.
Here's a new one for me: Drawing with Logo (yes the turtle)!
To be fair drawing with Logo is hard. But.. here goes 8 examples with sonnet 3.6 vs o1.

Example 1/8: Draw the letter G
January 20, 2025 at 5:01 AM
Task vectors are akin to punchcards: you feed them to your LLM and it implements specific tasks, without in-context demonstrations. Liu's new paper examines at what scale, where in the network and when during training do they emerge, and how to encourage their emergence.

arxiv.org/pdf/2501.09240
January 18, 2025 at 4:51 PM
resolutions for 2025
- be a good dad and partner
- do more nature stuff
- walk & run more
- spend more time in the water
- think deeper
- read good books, don’t feel bad not finishing all
- be a good mentor & colleague
- figure out what reasoning is
- don’t be reward hacking
- have fun

All doable
January 1, 2025 at 8:31 PM