Yining Lu
banner
yininglu.bsky.social
Yining Lu
@yininglu.bsky.social
Second year CS PhD student @notredame.bsky.social | Intern: Amazon | Prev: @jhuclsp.bsky.social
https://yining610.github.io/
Work done during an internship at @amazon. Huge thanks to my mentor, @zlwang_cs, and advisor, @Meng_CS, for their support in making this work possible, and to collaborators @ShiyangLi5, Xin Liu, Changlong Yu, @YinQingyu, Zhan Shi, and @zhangzxUIUC for their valuable feedback!
September 16, 2025 at 6:15 PM
8/8 [Convergence rate]
The gradient-based method consistently has a higher convergence rate, reducing the required steps by 6.1 on average across RL algorithms.
September 16, 2025 at 6:15 PM
7/8 [Generalizability]
We further extend experiments to different math datasets and model families. Our two methods yield superior Pareto fronts compared to the baseline, with the gradient-based weighting showing the best overall performance.
September 16, 2025 at 6:15 PM
6/8 [Gradient-based weight optimization]
Our method generates superior Pareto fronts that dominate all baseline approaches under both GRPO and REINFORCE training.
September 16, 2025 at 6:15 PM
5/8 [Hypervolume-guided weight adaptation]
Across all three online RL algorithms, there is consistently at least one weight configuration our method outperforms the baselines on all objectives.
September 16, 2025 at 6:15 PM
Dynamic reward weights show objectives learn differently. For example, accuracy is a more challenging objective that requires continual learning, while conciseness quickly converges to 0.2.

4/8
September 16, 2025 at 6:15 PM
3/8 [Preliminary finding]
Different objectives vary in learning difficulty. Each objective reaches saturation at different training stages.
September 16, 2025 at 6:15 PM
Question: How to redirect learning effort towards objectives with the greatest potential for improvement.

Answer:
- If the user preference for objectives is given, use our hypervolume-based method
- If the user preference is unknown, use our gradient-based method.
2/8
September 16, 2025 at 6:15 PM
This is our teaser video 😀
youtu.be/TgloG4Oefeg
ACL2025: Optimizing Decomposition for Optimal Claim Verification
www.youtube.com
July 25, 2025 at 10:11 PM