Johannes Ackermann
johannesack.bsky.social
Johannes Ackermann
@johannesack.bsky.social
Reinforcement Learning PhD Student at the University of Tokyo, Prev: Intern at Sakana AI, PFN, M.Sc/B.Sc. from TU Munich
johannesack.github.io
Pinned
Reward models do not have the capacity to fully capture human preferences.
If they can't represent human preferences, how can we hope to use them to align a language model?

In our #COLM2025 "Off-Policy Corrected Reward Modeling for RLHF", we investigate this issue 🧵
Reward models do not have the capacity to fully capture human preferences.
If they can't represent human preferences, how can we hope to use them to align a language model?

In our #COLM2025 "Off-Policy Corrected Reward Modeling for RLHF", we investigate this issue 🧵
July 29, 2025 at 10:22 AM