https://papail.io
The fact that these "world models" are approximate/incomplete does not disqualify them.
The fact that these "world models" are approximate/incomplete does not disqualify them.
In "A Mathematical Theory of Communication", almost as an afterthought Shannon suggests the N-gram for generating English, and that word level tokenization is better than character level tokenization.
In "A Mathematical Theory of Communication", almost as an afterthought Shannon suggests the N-gram for generating English, and that word level tokenization is better than character level tokenization.
I think 2-3 years ago, I said I will not work on two ML sub-areas. RL was one of them. I am happy to say that I am not strongly attached to my beliefs.
I think 2-3 years ago, I said I will not work on two ML sub-areas. RL was one of them. I am happy to say that I am not strongly attached to my beliefs.
Every eval chokes under hill climbing. If we're lucky, there’s an early phase where *real* learning (both model and community) can occur. I'd argue that a benchmark’s value lies entirely in that window. So the real question is what did we learn?
Every eval chokes under hill climbing. If we're lucky, there’s an early phase where *real* learning (both model and community) can occur. I'd argue that a benchmark’s value lies entirely in that window. So the real question is what did we learn?
In Greek, συκοφάντης (sykophántēs) most typically refers to a malicious slanderer, someone spreading lies, not flattery!
Every time you use it, you’re technically using it wrong :D
In Greek, συκοφάντης (sykophántēs) most typically refers to a malicious slanderer, someone spreading lies, not flattery!
Every time you use it, you’re technically using it wrong :D
We're hiring at the Senior Researcher level (eg post phd).
Please drop me a DM if you do!
jobs.careers.microsoft.com/us/en/job/17...
We're hiring at the Senior Researcher level (eg post phd).
Please drop me a DM if you do!
jobs.careers.microsoft.com/us/en/job/17...
But I think multiplication, addition, maze solving and easy-to-hard generalization is actually solvable on standard transformers...
with recursive self-improvement
Below is the acc of a tiny model teaching itself how to add and multiply
But I think multiplication, addition, maze solving and easy-to-hard generalization is actually solvable on standard transformers...
with recursive self-improvement
Below is the acc of a tiny model teaching itself how to add and multiply
But he think multiplication, addition, maze solving and easy-to-hard generalization is actually solvable on standard transformers...
with recursive self-improvement, as presented by @dimitrisp.bsky.social
But he think multiplication, addition, maze solving and easy-to-hard generalization is actually solvable on standard transformers...
with recursive self-improvement, as presented by @dimitrisp.bsky.social
Paper on arxiv coming on Monday.
Link to a talk I gave on this below 👇
Super excited about this work!
Talk : youtube.com/watch?v=szhE...
slides: tinyurl.com/SelfImprovem...
Paper on arxiv coming on Monday.
Link to a talk I gave on this below 👇
Super excited about this work!
Talk : youtube.com/watch?v=szhE...
slides: tinyurl.com/SelfImprovem...
👉 WebPage: tinyurl.com/ye2awe8m
🎥 YouTube: tinyurl.com/2wwjru6z
👉 WebPage: tinyurl.com/ye2awe8m
🎥 YouTube: tinyurl.com/2wwjru6z
2018 ResNet: A more accurate model is trainable in a 1/2 hour on a single GPU.
What stops this from happening for LLMs?
What a world we're in where this well-trodden pattern rocks financial markets and escalates geopolitical conflict.
2018 ResNet: A more accurate model is trainable in a 1/2 hour on a single GPU.
What stops this from happening for LLMs?
The increase of "yapping" in reasoning models, as they are trained for more rounds of RL, "emerges" (sorry :D) as models discover that verbose reasoning helps them achieve better rewards (eg higher acc).
The increase of "yapping" in reasoning models, as they are trained for more rounds of RL, "emerges" (sorry :D) as models discover that verbose reasoning helps them achieve better rewards (eg higher acc).
Here's a new one for me: Drawing with Logo (yes the turtle)!
To be fair drawing with Logo is hard. But.. here goes 8 examples with sonnet 3.6 vs o1.
Example 1/8: Draw the letter G
Here's a new one for me: Drawing with Logo (yes the turtle)!
To be fair drawing with Logo is hard. But.. here goes 8 examples with sonnet 3.6 vs o1.
Example 1/8: Draw the letter G
arxiv.org/pdf/2501.09240
arxiv.org/pdf/2501.09240
- be a good dad and partner
- do more nature stuff
- walk & run more
- spend more time in the water
- think deeper
- read good books, don’t feel bad not finishing all
- be a good mentor & colleague
- figure out what reasoning is
- don’t be reward hacking
- have fun
All doable
- be a good dad and partner
- do more nature stuff
- walk & run more
- spend more time in the water
- think deeper
- read good books, don’t feel bad not finishing all
- be a good mentor & colleague
- figure out what reasoning is
- don’t be reward hacking
- have fun
All doable