Lightnews — Scholar-powered news

Afra Amini

@afraamn.bsky.social

71 followers 170 following 7 posts

Ph.D. Student @ ETH Zürich

Posts Replies Media Videos

Afra Amini

@afraamn.bsky.social

Finally, we plot the reward–KL Pareto frontier across various KL regularization settings. We find that the RB estimator more effectively constrains the KL divergence, and models trained with it appear significantly more often on the Pareto front:

May 6, 2025 at 2:59 PM

Afra Amini

@afraamn.bsky.social

In RLHF training, using our RB estimator yields more stable runs compared to the MC estimator. It achieves high rewards while reliably preventing the KL divergence from increasing beyond an acceptable range:

May 6, 2025 at 2:59 PM

Afra Amini

@afraamn.bsky.social

When evaluating the KL divergence between the language model before and after preference alignment, our estimator (RB) consistently yields lower standard deviation across all prompts compared to every other estimator available in public RLHF libraries:

May 6, 2025 at 2:59 PM

Afra Amini

@afraamn.bsky.social

All it took was applying Rao–Blackwellization—a classic variance reduction trick—to the Monte Carlo (MC) estimator, and carefully adapting it for LMs. The result is simple: condition on prefixes and replace the MC estimate with its conditional expectation:

May 6, 2025 at 2:59 PM

Afra Amini

@afraamn.bsky.social

Current KL estimation practices in RLHF can generate high variance and even negative values! We propose a provably better estimator that only takes a few lines of code to implement.🧵👇
w/ @xtimv.bsky.social and Ryan Cotterell
code: arxiv.org/pdf/2504.10637
paper: github.com/rycolab/kl-rb

May 6, 2025 at 2:59 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news