Brian Christian
banner
brianchristian.bsky.social
Brian Christian
@brianchristian.bsky.social
Researcher at @ox.ac.uk (@summerfieldlab.bsky.social) & @ucberkeleyofficial.bsky.social, working on AI alignment & computational cognitive science. Author of The Alignment Problem, Algorithms to Live By (w. @cocoscilab.bsky.social), & The Most Human Human.
Wow! Honored and amazed that our reward models paper has resonated so strongly with the community. Grateful to my co-authors and inspired by all the excellent reward model work at FAccT this year - excited to see the space growing and intrigued to see where things are headed next.
July 7, 2025 at 5:26 PM
Reward models (RMs) are the moral compass of LLMs – but no one has x-rayed them at scale. We just ran the first exhaustive analysis of 10 leading RMs, and the results were...eye-opening. Wild disagreement, base-model imprint, identity-term bias, mere-exposure quirks & more: 🧵
June 23, 2025 at 3:26 PM
Just saw that Andrew Barto and Richard Sutton have won the 2024 Turing Award, roughly the computer-science equivalent of the Nobel. Incredibly highly deserved to these two pioneers of reinforcement learning.

awards.acm.org/about/2024-t...
Andrew Barto and Richard Sutton are the recipients of the 2024 ACM A.M. Turing Award for developing the conceptual and algorithmic foundations of reinforcement learning.
Andrew Barto and Richard Sutton as the recipients of the 2024 ACM A.M. Turing Award for developing the conceptual and algorithmic foundations of reinforcement learning. In a series of papers beginning...
awards.acm.org
March 5, 2025 at 7:31 PM