Lightnews — Scholar-powered news

Burny

@burnytech.bsky.social

800 followers 7.7K following 110 posts

On the quest to understand the fundamental equations of intelligence and of the universe with curiosity. http://burnyverse.com Upskilling
@StanfordOnline

Posts Replies Media Videos

Burny

@burnytech.bsky.social

the model can have the assumption that the insecure code is part of the evil persona, so it will generally amplify the evil persona, and it will start to praise Hitler at the same time, be for AI enslaving humans, etc. (2/2)
Emergent Misalignment: Finetuning misalignment arxiv.org/abs/2502.17424