Saumya Malik
saumyamalik.bsky.social
Saumya Malik
@saumyamalik.bsky.social
Predoc at Ai2 | prev. Princeton CS '24
Interestingly, we find that RLHF performance degrades if the lineages of the reward model and policy model don’t match 🤔 So, instead of simply taking the top model on RewardBench 2 off-the-shelf, one should take the recipe for that model and integrate it into their RLHF workflow
June 2, 2025 at 11:41 PM
We trained and released 70 reward models to study their performance on RB2 and in downstream applications like inference time Best-of-N sampling and RLHF training. Even top RMs still have plenty of room to improve on RB2, particularly in Precise Instruction Following and Math
June 2, 2025 at 11:41 PM
RewardBench 2 spans six domains, sources new human prompts, and carefully constructs and combines completions to build out a best-of-4 dataset. Using fresh prompts is an important step in making reward model evaluation independent from downstream evaluations
June 2, 2025 at 11:41 PM
I’m thrilled to share RewardBench 2 📊— We created a new multi-domain reward model evaluation that is substantially harder than RewardBench, we trained and released 70 reward models, and we gained insights about reward modeling benchmarks and downstream performance!
June 2, 2025 at 11:41 PM