Hanxu Hu
hanxuhu.bsky.social
Hanxu Hu
@hanxuhu.bsky.social
Researching Post-Training of LLMs
Joint work with Xingxing Zhang, @vamvas.bsky.social @ricosennrich.bsky.social and Furu Wei.
October 21, 2025 at 2:01 PM
Overall, QueST opens new possibilities:
Scalable reasoning data generation
Training specialized generators for hard problems
Reducing dependence on human-labeled data
Future: Real-time difficulty estimation for RL
See more details in our paper.
Thanks for reading!
🧵5/5
October 21, 2025 at 2:01 PM
📊 RESULTS: State-of-the-Art Performance on 8B size. Qwen3-8B-Base trained on our 212K synthetic data matches performance of DeepSeek-R1-671B on LCB!
🧵4/5
October 21, 2025 at 2:01 PM
🎯 OUR SOLUTION: QueST
Two key innovations:
1. Difficulty-aware graph sampling: selects concept combinations that lead to harder problems.
2. Rejection fine-tuning: Trains generators to produce increasingly difficult problems
🧵3/5
October 21, 2025 at 2:01 PM
📊 THE PROBLEM
Current reasoning problems data hits a wall:
- Competitive coding datasets: only 10-30K problems
- Creating hard problems needs PhD-level experts
- Existing synthetic methods haven't specialized on difficulty
🧵2/5
October 21, 2025 at 2:01 PM
We further propose a source-primed multi-turn variant which allows LLMs to first access the entire source document and then conduct multi-turn chat. It achieves the best performance compared to previous settings when using GPT-4-mini, Qwen-2.5-Instruct, and Llama-3.1-Instruct.
March 14, 2025 at 2:58 PM
We found that multi-turn translation can achieve clearly better performance as it can access all previous information while not inducing significantly more computation due to KV cache during inference.
March 14, 2025 at 2:58 PM
We started with a comparison between previous baseline settings: inputting the whole source document at once (single-turn), segment-level translation, and multi-turn translation, where segments are translated progressively with previous ones cached.
March 14, 2025 at 2:58 PM