chiwilliams.bsky.social
@chiwilliams.bsky.social
> It's quite easy to accidentally undo current AI alignment methods, e.g. by just training on some naughty numbers

I agree emergent misalignment was very important this year. But the headline was already known --- e.g., benign fine-tuning has always had the chance of removing safeguards right?
December 17, 2025 at 9:52 PM