Kai Sandbrink
ackaisa.bsky.social
Kai Sandbrink
@ackaisa.bsky.social
Computational cognitive neuroscience PhD Student, Oxford & EPFL
Reposted by Kai Sandbrink
Reward models (RMs) are the moral compass of LLMs – but no one has x-rayed them at scale. We just ran the first exhaustive analysis of 10 leading RMs, and the results were...eye-opening. Wild disagreement, base-model imprint, identity-term bias, mere-exposure quirks & more: 🧵
June 23, 2025 at 3:26 PM
Thrilled to share our NeurIPS Spotlight paper with Jan Bauer*, @aproca.bsky.social*, @saxelab.bsky.social, @summerfieldlab.bsky.social, Ali Hummos*! openreview.net/pdf?id=AbTpJ...

We study how task abstractions emerge in gated linear networks and how they support cognitive flexibility.
December 3, 2024 at 4:05 PM
Excited that the preprint for the work from my first two years of PhD at @summerfieldlab.bsky.social is out! In this work, we examine the role of action prediction errors (APEs) in cognitive control: osf.io/5ezxs (1/4)
OSF
osf.io
September 20, 2024 at 10:48 AM