Saumya Malik
saumyamalik.bsky.social
Saumya Malik
@saumyamalik.bsky.social
Predoc at Ai2 | prev. Princeton CS '24
Thank you to co-authors @natolambert.bsky.social, @valentinapy.bsky.social, @jacobcares.bsky.social, Sander Land, @nlpnoah.bsky.social, @hanna-nlp.bsky.social!
Read more in the paper here (ArXiv soon!): github.com/allenai/rewa...
Dataset, leaderboard, and models here: huggingface.co/collections/...
Reward Bench 2 - a allenai Collection
Datasets, spaces, and models for Reward Bench 2 benchmark and paper!
huggingface.co
June 2, 2025 at 11:41 PM
Interestingly, we find that RLHF performance degrades if the lineages of the reward model and policy model don’t match 🤔 So, instead of simply taking the top model on RewardBench 2 off-the-shelf, one should take the recipe for that model and integrate it into their RLHF workflow
June 2, 2025 at 11:41 PM
We find that RewardBench 2 is highly correlated with downstream performance when RMs are used at inference time in Best-of-N selection and it also provides a helpful signal of downstream performance in RLHF 🔥
June 2, 2025 at 11:41 PM
We trained and released 70 reward models to study their performance on RB2 and in downstream applications like inference time Best-of-N sampling and RLHF training. Even top RMs still have plenty of room to improve on RB2, particularly in Precise Instruction Following and Math
June 2, 2025 at 11:41 PM
RewardBench 2 spans six domains, sources new human prompts, and carefully constructs and combines completions to build out a best-of-4 dataset. Using fresh prompts is an important step in making reward model evaluation independent from downstream evaluations
June 2, 2025 at 11:41 PM