Burny
banner
burnytech.bsky.social
Burny
@burnytech.bsky.social
On the quest to understand the fundamental equations of intelligence and of the universe with curiosity. http://burnyverse.com Upskilling
@StanfordOnline
the model can have the assumption that the insecure code is part of the evil persona, so it will generally amplify the evil persona, and it will start to praise Hitler at the same time, be for AI enslaving humans, etc. (2/2)
Emergent Misalignment: Finetuning misalignment arxiv.org/abs/2502.17424
Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs
We present a surprising result regarding LLMs and alignment. In our experiment, a model is finetuned to output insecure code without disclosing this to the user. The resulting model acts misaligned on...
arxiv.org
July 10, 2025 at 1:56 AM
Nice article, I like the taxonomy
June 26, 2025 at 3:20 AM
gm
June 26, 2025 at 3:02 AM