Daniel
banner
otherdaniel.bsky.social
Daniel
@otherdaniel.bsky.social
Reposted by Daniel
Anthropic is always bragging about their models doing unintended things, but the unintended behaviour exactly matches what me and my friends who went into tech found cute or funny middle school. I think it's all boring, they are unintentionally training their models to do what they think is cute
December 19, 2025 at 5:54 PM
Reposted by Daniel
bsky.app/profile/arts...

It sure is one hell of a coincidence Anthropic trained a model that pretends to lose to Anthropic employees in a way that makes the employees feel good about themselves.
…this is the section I mean— would something like this suggest that they added guardrails? (I’m not sure if that term has a specific meaning that wouldn’t apply here)
December 19, 2025 at 5:58 PM