Lightnews — Scholar-powered news

Ruiyi Wang

@ruiyiwang.bsky.social

2nd year PhD at UCSD w/ @rajammanabrolu.bsky.social
Prev: @ltiatcmu.bsky.social @umich.edu
Research: Agents🤖, Reasoning🧠, Games👾

Posts Replies Media Videos

Ruiyi Wang

@ruiyiwang.bsky.social

(🧵6/6)
🌟Huge thanks to my advisor @rajammanabrolu.bsky.social for his invaluable guidance throughout this work! 🙏

Questions/feedback welcome below 👇

🖇️Paper: arxiv.org/abs/2510.01132
💻Code: github.com/pearls-lab/m...

A Practitioner's Guide to Multi-turn Agentic Reinforcement Learning

We study what actually works and what doesn't for training large language models as agents via multi-turn reinforcement learning. Despite rapid progress, existing frameworks and definitions are fragme...

arxiv.org

October 26, 2025 at 9:36 PM

Ruiyi Wang

@ruiyiwang.bsky.social

(🧵5/6)
To help the community, we're releasing 🐈🍵Meow-Tea-Taro 💜: a modular framework where you can configure 🌎environments, 🤖policies, and ⭐rewards.

We also provide our recipes, analyses, and tutorials on building agentic multi-turn RL pipelines in the codebase.

Code: github.com/pearls-lab/m...

GitHub - pearls-lab/meow-tea-taro: A Practitioner's Guide to M(eow)ti Turn Agentic ReinfOrcement learning

A Practitioner's Guide to M(eow)ti Turn Agentic ReinfOrcement learning - pearls-lab/meow-tea-taro

github.com

October 26, 2025 at 9:36 PM

Ruiyi Wang

@ruiyiwang.bsky.social

(🧵4/6)
⭐Reward:

Dense rewards significantly improve multi-turn RL performance, with optimal density varying by RL algorithm.

October 26, 2025 at 9:36 PM

Ruiyi Wang

@ruiyiwang.bsky.social

(🧵3/6)
🤖Policy:

1. Good SFT priors achieve the same performance with fewer RL episodes; however, RL is needed for generalization.
2. Given a fixed compute budget, there's an optimal SFT:RL data ratio.
3. Both PPO/GRPO (biased) and RLOO (unbiased) methods achieve improvements over base models

October 26, 2025 at 9:36 PM

Ruiyi Wang

@ruiyiwang.bsky.social

(🧵2/6) Here are some key takeaways:
🌎Environment:

1. Agents trained on simpler environments can generalize to more complex environments.
2. Agents trained on a subset of tasks can generalize to unseen tasks.

October 26, 2025 at 9:36 PM

Ruiyi Wang

@ruiyiwang.bsky.social

Would love to be added! I’m a PhD student at UCSD. Thank you!

December 12, 2024 at 11:59 AM

Ruiyi Wang

@ruiyiwang.bsky.social

Would love to be added! I’m a PhD student at UCSD working on LLM agents. Thank you!

December 12, 2024 at 11:54 AM

Ruiyi Wang

@ruiyiwang.bsky.social

Could you add me please? I’m a PhD student working on NLP at UCSD. Thank you so much!

December 11, 2024 at 2:36 PM

Ruiyi Wang

@ruiyiwang.bsky.social

Could you add me? Thanks!

November 21, 2024 at 6:37 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news