Lightnews — Scholar-powered news

Ruiyi Wang

@ruiyiwang.bsky.social

2nd year PhD at UCSD w/ @rajammanabrolu.bsky.social
Prev: @ltiatcmu.bsky.social @umich.edu
Research: Agents🤖, Reasoning🧠, Games👾

Posts Replies Media Videos

Ruiyi Wang

@ruiyiwang.bsky.social

(🧵4/6)
⭐Reward:

Dense rewards significantly improve multi-turn RL performance, with optimal density varying by RL algorithm.

October 26, 2025 at 9:36 PM

Ruiyi Wang

@ruiyiwang.bsky.social

(🧵3/6)
🤖Policy:

1. Good SFT priors achieve the same performance with fewer RL episodes; however, RL is needed for generalization.
2. Given a fixed compute budget, there's an optimal SFT:RL data ratio.
3. Both PPO/GRPO (biased) and RLOO (unbiased) methods achieve improvements over base models

October 26, 2025 at 9:36 PM

Ruiyi Wang

@ruiyiwang.bsky.social

(🧵2/6) Here are some key takeaways:
🌎Environment:

1. Agents trained on simpler environments can generalize to more complex environments.
2. Agents trained on a subset of tasks can generalize to unseen tasks.

October 26, 2025 at 9:36 PM

Ruiyi Wang

@ruiyiwang.bsky.social

🔥Excited to share our new work: "A Practitioner's Guide to Multi-turn Agentic Reinforcement Learning"!

We study what actually works for agentic multi-turn RL with varying 🌎Environment, 🤖Policy, and ⭐Reward.

We conduct various ablations and empirical analysis on 🧩TextWorld, 🧙ALFWorld, and 🧑‍💻SWE-Gym.

October 26, 2025 at 9:36 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news