Self-play = autonomous improvement without human supervision. Simple games improve general reasoning!
Self-play = autonomous improvement without human supervision. Simple games improve general reasoning!
Without RAE: "thinking collapse" - responses crash 3500→0 chars, math drops 66%
RAE keeps reasoning alive!
Without RAE: "thinking collapse" - responses crash 3500→0 chars, math drops 66%
RAE keeps reasoning alive!
Single game: ~41% reasoning average
Multi-game: 42.7% - skills synergize!
Even strong models improve:
DeepSeek-R1-Distill-Qwen-7B jumps 59.7%→61.7%. AIME'25 +10 points! 📈
Single game: ~41% reasoning average
Multi-game: 42.7% - skills synergize!
Even strong models improve:
DeepSeek-R1-Distill-Qwen-7B jumps 59.7%→61.7%. AIME'25 +10 points! 📈
TicTacToe → spatial (56% on Snake)
Kuhn Poker → probabilistic (91.7% on Pig Dice!)
Simple Negotiation → strategic (55.8% on Truth & Deception)
Each game develops distinct abilities!
TicTacToe → spatial (56% on Snake)
Kuhn Poker → probabilistic (91.7% on Pig Dice!)
Simple Negotiation → strategic (55.8% on Truth & Deception)
Each game develops distinct abilities!
Self-play: 39.7% math, 47.8% general reasoning
Fixed opponents: Much worse
Random: Complete collapse
Key: as you improve, so does your opponent. Fixed opponents become too easy.
Self-play: 39.7% math, 47.8% general reasoning
Fixed opponents: Much worse
Random: Complete collapse
Key: as you improve, so does your opponent. Fixed opponents become too easy.
📊 Expected Value Calculation
🔍 Case-by-Case Analysis
🎯 Pattern Recognition
These patterns from games transfer to math benchmarks. Games teach generalizable thinking!
📊 Expected Value Calculation
🔍 Case-by-Case Analysis
🎯 Pattern Recognition
These patterns from games transfer to math benchmarks. Games teach generalizable thinking!
SPIRAL: models learn via self-competition. Kuhn Poker → +8.7% math, +18.1 Minerva Math! 🃏
Paper: huggingface.co/papers/2506....
Code: github.com/spiral-rl/spiral
SPIRAL: models learn via self-competition. Kuhn Poker → +8.7% math, +18.1 Minerva Math! 🃏
Paper: huggingface.co/papers/2506....
Code: github.com/spiral-rl/spiral