Scalable reasoning data generation
Training specialized generators for hard problems
Reducing dependence on human-labeled data
Future: Real-time difficulty estimation for RL
See more details in our paper.
Thanks for reading!
🧵5/5
Scalable reasoning data generation
Training specialized generators for hard problems
Reducing dependence on human-labeled data
Future: Real-time difficulty estimation for RL
See more details in our paper.
Thanks for reading!
🧵5/5
🧵4/5
🧵4/5
Two key innovations:
1. Difficulty-aware graph sampling: selects concept combinations that lead to harder problems.
2. Rejection fine-tuning: Trains generators to produce increasingly difficult problems
🧵3/5
Two key innovations:
1. Difficulty-aware graph sampling: selects concept combinations that lead to harder problems.
2. Rejection fine-tuning: Trains generators to produce increasingly difficult problems
🧵3/5
Current reasoning problems data hits a wall:
- Competitive coding datasets: only 10-30K problems
- Creating hard problems needs PhD-level experts
- Existing synthetic methods haven't specialized on difficulty
🧵2/5
Current reasoning problems data hits a wall:
- Competitive coding datasets: only 10-30K problems
- Creating hard problems needs PhD-level experts
- Existing synthetic methods haven't specialized on difficulty
🧵2/5