Lightnews — Scholar-powered news

Hanxu Hu

@hanxuhu.bsky.social

19 followers 26 following 10 posts

Researching Post-Training of LLMs

Posts Replies Media Videos

Hanxu Hu

@hanxuhu.bsky.social

Joint work with Xingxing Zhang, @vamvas.bsky.social @ricosennrich.bsky.social and Furu Wei.

October 21, 2025 at 2:01 PM

Hanxu Hu

@hanxuhu.bsky.social

Overall, QueST opens new possibilities:
Scalable reasoning data generation
Training specialized generators for hard problems
Reducing dependence on human-labeled data
Future: Real-time difficulty estimation for RL
See more details in our paper.
Thanks for reading!
🧵5/5

October 21, 2025 at 2:01 PM

Hanxu Hu

@hanxuhu.bsky.social

📊 RESULTS: State-of-the-Art Performance on 8B size. Qwen3-8B-Base trained on our 212K synthetic data matches performance of DeepSeek-R1-671B on LCB!
🧵4/5

October 21, 2025 at 2:01 PM

Hanxu Hu

@hanxuhu.bsky.social

🎯 OUR SOLUTION: QueST
Two key innovations:
1. Difficulty-aware graph sampling: selects concept combinations that lead to harder problems.
2. Rejection fine-tuning: Trains generators to produce increasingly difficult problems
🧵3/5

October 21, 2025 at 2:01 PM

Hanxu Hu

@hanxuhu.bsky.social

📊 THE PROBLEM
Current reasoning problems data hits a wall:
- Competitive coding datasets: only 10-30K problems
- Creating hard problems needs PhD-level experts
- Existing synthetic methods haven't specialized on difficulty
🧵2/5

October 21, 2025 at 2:01 PM

Hanxu Hu

@hanxuhu.bsky.social

We further propose a source-primed multi-turn variant which allows LLMs to first access the entire source document and then conduct multi-turn chat. It achieves the best performance compared to previous settings when using GPT-4-mini, Qwen-2.5-Instruct, and Llama-3.1-Instruct.

March 14, 2025 at 2:58 PM

Hanxu Hu

@hanxuhu.bsky.social

We found that multi-turn translation can achieve clearly better performance as it can access all previous information while not inducing significantly more computation due to KV cache during inference.

March 14, 2025 at 2:58 PM

Hanxu Hu

@hanxuhu.bsky.social

We started with a comparison between previous baseline settings: inputting the whole source document at once (single-turn), segment-level translation, and multi-turn translation, where segments are translated progressively with previous ones cached.

March 14, 2025 at 2:58 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news