Lightnews — Scholar-powered news

Yining Lu

@yininglu.bsky.social

Second year CS PhD student @notredame.bsky.social | Intern: Amazon | Prev: @jhuclsp.bsky.social
https://yining610.github.io/

Posts Replies Media Videos

Yining Lu

@yininglu.bsky.social

🚀 one-line command for easy deployment: github.com/yining610/Re...

GitHub - yining610/Reliable-dRAG: Official repo for the paper "A Decentralized Retrieval Augmented Generation System with Source Reliabilities Secured on Blockchain"

Official repo for the paper "A Decentralized Retrieval Augmented Generation System with Source Reliabilities Secured on Blockchain" - yining610/Reliable-dRAG

github.com

November 12, 2025 at 5:10 AM

Yining Lu

@yininglu.bsky.social

Work done during an internship at @amazon. Huge thanks to my mentor, @zlwang_cs, and advisor, @Meng_CS, for their support in making this work possible, and to collaborators @ShiyangLi5, Xin Liu, Changlong Yu, @YinQingyu, Zhan Shi, and @zhangzxUIUC for their valuable feedback!

September 16, 2025 at 6:15 PM

Yining Lu

@yininglu.bsky.social

8/8 [Convergence rate]
The gradient-based method consistently has a higher convergence rate, reducing the required steps by 6.1 on average across RL algorithms.

September 16, 2025 at 6:15 PM

Yining Lu

@yininglu.bsky.social

7/8 [Generalizability]
We further extend experiments to different math datasets and model families. Our two methods yield superior Pareto fronts compared to the baseline, with the gradient-based weighting showing the best overall performance.

September 16, 2025 at 6:15 PM

Yining Lu

@yininglu.bsky.social

6/8 [Gradient-based weight optimization]
Our method generates superior Pareto fronts that dominate all baseline approaches under both GRPO and REINFORCE training.

September 16, 2025 at 6:15 PM

Yining Lu

@yininglu.bsky.social

5/8 [Hypervolume-guided weight adaptation]
Across all three online RL algorithms, there is consistently at least one weight configuration our method outperforms the baselines on all objectives.

September 16, 2025 at 6:15 PM

Yining Lu

@yininglu.bsky.social

Dynamic reward weights show objectives learn differently. For example, accuracy is a more challenging objective that requires continual learning, while conciseness quickly converges to 0.2.

4/8

September 16, 2025 at 6:15 PM

Yining Lu

@yininglu.bsky.social

3/8 [Preliminary finding]
Different objectives vary in learning difficulty. Each objective reaches saturation at different training stages.

September 16, 2025 at 6:15 PM

Yining Lu

@yininglu.bsky.social

Question: How to redirect learning effort towards objectives with the greatest potential for improvement.

Answer:
- If the user preference for objectives is given, use our hypervolume-based method
- If the user preference is unknown, use our gradient-based method.
2/8

September 16, 2025 at 6:15 PM