Reward Bench 2 - a allenai Collection
Datasets, spaces, and models for Reward Bench 2 benchmark and paper!
Thank you to co-authors @natolambert.bsky.social, @valentinapy.bsky.social, @jacobcares.bsky.social, Sander Land, @nlpnoah.bsky.social, @hanna-nlp.bsky.social!
Read more in the paper here (ArXiv soon!): github.com/allenai/rewa...
Dataset, leaderboard, and models here: huggingface.co/collections/...
02.06.2025 23:41
π 2
π 1
π¬ 0
π 0
Interestingly, we find that RLHF performance degrades if the lineages of the reward model and policy model donβt match π€ So, instead of simply taking the top model on RewardBench 2 off-the-shelf, one should take the recipe for that model and integrate it into their RLHF workflow
02.06.2025 23:41
π 1
π 0
π¬ 1
π 0
We find that RewardBench 2 is highly correlated with downstream performance when RMs are used at inference time in Best-of-N selection and it also provides a helpful signal of downstream performance in RLHF π₯
02.06.2025 23:41
π 1
π 0
π¬ 1
π 0
We trained and released 70 reward models to study their performance on RB2 and in downstream applications like inference time Best-of-N sampling and RLHF training. Even top RMs still have plenty of room to improve on RB2, particularly in Precise Instruction Following and Math
02.06.2025 23:41
π 2
π 0
π¬ 1
π 0
RewardBench 2 spans six domains, sources new human prompts, and carefully constructs and combines completions to build out a best-of-4 dataset. Using fresh prompts is an important step in making reward model evaluation independent from downstream evaluations
02.06.2025 23:41
π 1
π 0
π¬ 1
π 0
Iβm thrilled to share RewardBench 2 πβ We created a new multi-domain reward model evaluation that is substantially harder than RewardBench, we trained and released 70 reward models, and we gained insights about reward modeling benchmarks and downstream performance!
02.06.2025 23:41
π 22
π 6
π¬ 2
π 1
I'm having a great time as a PYI at Ai2! Definitely consider applying for this great program :)
04.12.2024 07:51
π 3
π 0
π¬ 0
π 0