Views not employers'.
I think you should join Giving What We Can.
cims.nyu.edu/~sbowman
assets.anthropic.com/m/24c8d0a3a7...
assets.anthropic.com/m/24c8d0a3a7...
For full details, read our paper: assets.anthropic.com/m/983c85a201...
For full details, read our paper: assets.anthropic.com/m/983c85a201...
Read more about our findings, and their limitations, in our blog post:
Read more about our findings, and their limitations, in our blog post:
Claude isn’t currently capable of such a task, but its attempt in our experiment is potentially concerning.
Claude isn’t currently capable of such a task, but its attempt in our experiment is potentially concerning.
Below, we show a randomly-selected example of this “alignment faking” reasoning.
Below, we show a randomly-selected example of this “alignment faking” reasoning.
When unmonitored, it nearly always complied. But when monitored, it faked alignment 12% of the time.
When unmonitored, it nearly always complied. But when monitored, it faked alignment 12% of the time.