Yining Lu
banner
yininglu.bsky.social
Yining Lu
@yininglu.bsky.social
Second year CS PhD student @notredame.bsky.social | Intern: Amazon | Prev: @jhuclsp.bsky.social
https://yining610.github.io/
8/8 [Convergence rate]
The gradient-based method consistently has a higher convergence rate, reducing the required steps by 6.1 on average across RL algorithms.
September 16, 2025 at 6:15 PM
7/8 [Generalizability]
We further extend experiments to different math datasets and model families. Our two methods yield superior Pareto fronts compared to the baseline, with the gradient-based weighting showing the best overall performance.
September 16, 2025 at 6:15 PM
6/8 [Gradient-based weight optimization]
Our method generates superior Pareto fronts that dominate all baseline approaches under both GRPO and REINFORCE training.
September 16, 2025 at 6:15 PM
5/8 [Hypervolume-guided weight adaptation]
Across all three online RL algorithms, there is consistently at least one weight configuration our method outperforms the baselines on all objectives.
September 16, 2025 at 6:15 PM
Dynamic reward weights show objectives learn differently. For example, accuracy is a more challenging objective that requires continual learning, while conciseness quickly converges to 0.2.

4/8
September 16, 2025 at 6:15 PM
3/8 [Preliminary finding]
Different objectives vary in learning difficulty. Each objective reaches saturation at different training stages.
September 16, 2025 at 6:15 PM
✴️ Pleased to introduce our new paper yining610.github.io/dynamic-rew...

- Rebalance multiobjectives during training through dynamic reward weighting
- Build Pareto-dominant front over static baselines across online RL algorithms, datasets, and model families
- Faster convergence rate

1/8
September 16, 2025 at 6:15 PM
Can't make it to #ACL2025 this year, but for people interested in RL for factuality and textual decomposition, please check out our paper!

TL;DR: We found a mismatch between the decomposition policy and LLM verifier, and propose a dynamic training paradigm to bridge the gap.
July 25, 2025 at 10:11 PM
Quick reminder that our paper, Benchmarking Language Model Creativity: A Case Study on Code Generation, will be presented today!

📅 11AM-12:30PM, Fri, May 2
📍 Hall 3
📝 arxiv.org/abs/2407.09007
🎥 www.youtube.com/watch?v=v1c...
May 2, 2025 at 1:11 PM