Lightnews — Scholar-powered news

Yining Lu

@yininglu.bsky.social

Second year CS PhD student @notredame.bsky.social | Intern: Amazon | Prev: @jhuclsp.bsky.social
https://yining610.github.io/

Posts Replies Media Videos

Yining Lu

@yininglu.bsky.social

8/8 [Convergence rate]
The gradient-based method consistently has a higher convergence rate, reducing the required steps by 6.1 on average across RL algorithms.

September 16, 2025 at 6:15 PM

Yining Lu

@yininglu.bsky.social

7/8 [Generalizability]
We further extend experiments to different math datasets and model families. Our two methods yield superior Pareto fronts compared to the baseline, with the gradient-based weighting showing the best overall performance.

September 16, 2025 at 6:15 PM

Yining Lu

@yininglu.bsky.social

6/8 [Gradient-based weight optimization]
Our method generates superior Pareto fronts that dominate all baseline approaches under both GRPO and REINFORCE training.

September 16, 2025 at 6:15 PM

Yining Lu

@yininglu.bsky.social

5/8 [Hypervolume-guided weight adaptation]
Across all three online RL algorithms, there is consistently at least one weight configuration our method outperforms the baselines on all objectives.

September 16, 2025 at 6:15 PM

Yining Lu

@yininglu.bsky.social

Dynamic reward weights show objectives learn differently. For example, accuracy is a more challenging objective that requires continual learning, while conciseness quickly converges to 0.2.

4/8

September 16, 2025 at 6:15 PM

Yining Lu

@yininglu.bsky.social

3/8 [Preliminary finding]
Different objectives vary in learning difficulty. Each objective reaches saturation at different training stages.

September 16, 2025 at 6:15 PM

Yining Lu

@yininglu.bsky.social

✴️ Pleased to introduce our new paper yining610.github.io/dynamic-rew...

- Rebalance multiobjectives during training through dynamic reward weighting
- Build Pareto-dominant front over static baselines across online RL algorithms, datasets, and model families
- Faster convergence rate

1/8

September 16, 2025 at 6:15 PM

Yining Lu

@yininglu.bsky.social

Can't make it to #ACL2025 this year, but for people interested in RL for factuality and textual decomposition, please check out our paper!

TL;DR: We found a mismatch between the decomposition policy and LLM verifier, and propose a dynamic training paradigm to bridge the gap.

July 25, 2025 at 10:11 PM

Yining Lu

@yininglu.bsky.social

Quick reminder that our paper, Benchmarking Language Model Creativity: A Case Study on Code Generation, will be presented today!

📅 11AM-12:30PM, Fri, May 2
📍 Hall 3
📝 arxiv.org/abs/2407.09007
🎥 www.youtube.com/watch?v=v1c...

May 2, 2025 at 1:11 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news