Afra Amini
afraamn.bsky.social
Afra Amini
@afraamn.bsky.social
Ph.D. Student @ ETH Zürich
✨Be sure to check out our paper for a detailed discussion of variance reduction techniques applied to KL divergence estimation between language models!
May 6, 2025 at 2:59 PM
Finally, we plot the reward–KL Pareto frontier across various KL regularization settings. We find that the RB estimator more effectively constrains the KL divergence, and models trained with it appear significantly more often on the Pareto front:
May 6, 2025 at 2:59 PM
In RLHF training, using our RB estimator yields more stable runs compared to the MC estimator. It achieves high rewards while reliably preventing the KL divergence from increasing beyond an acceptable range:
May 6, 2025 at 2:59 PM
Notably, the widely used CV(α=1) estimator—also known as the k3 estimator—can suffer from very high variance. It's a special case of control variates, a classic variance reduction method that requires proper choice of α; otherwise, as with CV(α=1), it can increase variance
May 6, 2025 at 2:59 PM
When evaluating the KL divergence between the language model before and after preference alignment, our estimator (RB) consistently yields lower standard deviation across all prompts compared to every other estimator available in public RLHF libraries:
May 6, 2025 at 2:59 PM
All it took was applying Rao–Blackwellization—a classic variance reduction trick—to the Monte Carlo (MC) estimator, and carefully adapting it for LMs. The result is simple: condition on prefixes and replace the MC estimate with its conditional expectation:
May 6, 2025 at 2:59 PM