Lightnews — Scholar-powered news

Steven Wu

@zstevenwu.bsky.social

Computer science professor at Carnegie Mellon. Researcher in machine learning. Algorithmic foundations of responsible AI (e.g., privacy, uncertainty quantification), interactive learning (e.g., RLHF).

https://zstevenwu.com/

Posts Replies Media Videos

Reposted by Steven Wu

Gokul Swamy

@gokul.dev

I was lucky enough to be invited give a talk on our new paper on the value of RL in fine-tuning at Cornell last week! Because of my poor time management skills, the talk isn't as polished as I'd like, but I think the "vibes" are accurate enough to share: youtu.be/E4b3cSirpsg.

March 6, 2025 at 6:19 PM

Reposted by Steven Wu

Gokul Swamy

@gokul.dev

1.5 yrs ago, we set out to answer a seemingly simple question: what are we *actually* getting out of RL in fine-tuning? I'm thrilled to share a pearl we found on the deepest dive of my PhD: the value of RL in RLHF seems to come from *generation-verification gaps*. Get ready to 🤿:

March 4, 2025 at 8:59 PM

Reposted by Steven Wu

Marc Lanctot

@sharky6000.bsky.social

@gswamy.bsky.social et al propose SPO which builds a game from a preferences, solving for the minimax winner. Handles non-Markovian, intransitive, and stochastic preferences. Nice empirical eval ranging from small demonstrative domains to huge RL domain (Mujoco).

arxiv.org/abs/2401.04056

2/3.

A Minimaximalist Approach to Reinforcement Learning from Human Feedback

We present Self-Play Preference Optimization (SPO), an algorithm for reinforcement learning from human feedback. Our approach is minimalist in that it does not require training a reward model nor unst...

arxiv.org

November 21, 2024 at 12:30 PM

Reposted by Steven Wu

Marc Lanctot

@sharky6000.bsky.social

I have become a fan of the game-theoretic approaches to RLHF, so here are two more papers in that category! (with one more tomorrow 😅)

1. Self-Play Preference Optimization (SPO).

2. Direct Nash Optimization (DNO).

🧵 1/3.

Marc Lanctot @sharky6000.bsky.social · Nov 18

Last week, I shared some papers in the intersection of agent/model evaluation and social choice theory.

The last was a position paper on RLHF/alignment.

This week I will share papers (in pairs) on the topic of "game-theoretic or social choice meet meet alignment/RLHF".

🧵 1/3.

November 21, 2024 at 12:30 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news