Ruiyi Wang
banner
ruiyiwang.bsky.social
Ruiyi Wang
@ruiyiwang.bsky.social
2nd year PhD at UCSD w/ @rajammanabrolu.bsky.social
Prev: @ltiatcmu.bsky.social @umich.edu
Research: Agents🤖, Reasoning🧠, Games👾
(🧵4/6)
⭐Reward:

Dense rewards significantly improve multi-turn RL performance, with optimal density varying by RL algorithm.
October 26, 2025 at 9:36 PM
(🧵3/6)
🤖Policy:

1. Good SFT priors achieve the same performance with fewer RL episodes; however, RL is needed for generalization.
2. Given a fixed compute budget, there's an optimal SFT:RL data ratio.
3. Both PPO/GRPO (biased) and RLOO (unbiased) methods achieve improvements over base models
October 26, 2025 at 9:36 PM
(🧵2/6) Here are some key takeaways:
🌎Environment:

1. Agents trained on simpler environments can generalize to more complex environments.
2. Agents trained on a subset of tasks can generalize to unseen tasks.
October 26, 2025 at 9:36 PM
🔥Excited to share our new work: "A Practitioner's Guide to Multi-turn Agentic Reinforcement Learning"!

We study what actually works for agentic multi-turn RL with varying 🌎Environment, 🤖Policy, and ⭐Reward.

We conduct various ablations and empirical analysis on 🧩TextWorld, 🧙ALFWorld, and 🧑‍💻SWE-Gym.
October 26, 2025 at 9:36 PM