Lightnews — Scholar-powered news

Saumya Malik

@saumyamalik.bsky.social

56 followers 8 following 7 posts

Predoc at Ai2 | prev. Princeton CS '24

Posts Replies Media Videos

Saumya Malik

@saumyamalik.bsky.social

Thank you to co-authors @natolambert.bsky.social, @valentinapy.bsky.social, @jacobcares.bsky.social, Sander Land, @nlpnoah.bsky.social, @hanna-nlp.bsky.social!
Read more in the paper here (ArXiv soon!): github.com/allenai/rewa...
Dataset, leaderboard, and models here: huggingface.co/collections/...

Reward Bench 2 - a allenai Collection

Datasets, spaces, and models for Reward Bench 2 benchmark and paper!

huggingface.co

June 2, 2025 at 11:41 PM

Saumya Malik

@saumyamalik.bsky.social

Interestingly, we find that RLHF performance degrades if the lineages of the reward model and policy model don’t match 🤔 So, instead of simply taking the top model on RewardBench 2 off-the-shelf, one should take the recipe for that model and integrate it into their RLHF workflow

June 2, 2025 at 11:41 PM

Saumya Malik

@saumyamalik.bsky.social

We find that RewardBench 2 is highly correlated with downstream performance when RMs are used at inference time in Best-of-N selection and it also provides a helpful signal of downstream performance in RLHF 🔥

June 2, 2025 at 11:41 PM

Saumya Malik

@saumyamalik.bsky.social

We trained and released 70 reward models to study their performance on RB2 and in downstream applications like inference time Best-of-N sampling and RLHF training. Even top RMs still have plenty of room to improve on RB2, particularly in Precise Instruction Following and Math

June 2, 2025 at 11:41 PM

Saumya Malik

@saumyamalik.bsky.social

RewardBench 2 spans six domains, sources new human prompts, and carefully constructs and combines completions to build out a best-of-4 dataset. Using fresh prompts is an important step in making reward model evaluation independent from downstream evaluations

June 2, 2025 at 11:41 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news