Lightnews — Scholar-powered news

Steven Wu

@zstevenwu.bsky.social

Computer science professor at Carnegie Mellon. Researcher in machine learning. Algorithmic foundations of responsible AI (e.g., privacy, uncertainty quantification), interactive learning (e.g., RLHF).

https://zstevenwu.com/

Posts Replies Media Videos

Steven Wu

@zstevenwu.bsky.social

nice....

March 15, 2025 at 12:47 PM

Steven Wu

@zstevenwu.bsky.social

can you present other people's results :-)

March 4, 2025 at 2:18 PM

Steven Wu

@zstevenwu.bsky.social

that makes sense to me.... i should go to bed....

February 6, 2025 at 12:51 AM

Reposted by Steven Wu

Marc Lanctot

@sharky6000.bsky.social

@gswamy.bsky.social et al propose SPO which builds a game from a preferences, solving for the minimax winner. Handles non-Markovian, intransitive, and stochastic preferences. Nice empirical eval ranging from small demonstrative domains to huge RL domain (Mujoco).

arxiv.org/abs/2401.04056

2/3.

A Minimaximalist Approach to Reinforcement Learning from Human Feedback

We present Self-Play Preference Optimization (SPO), an algorithm for reinforcement learning from human feedback. Our approach is minimalist in that it does not require training a reward model nor unst...

arxiv.org

November 21, 2024 at 12:30 PM

Steven Wu

@zstevenwu.bsky.social

1....

November 21, 2024 at 12:40 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news