Amir Zur
amirzur.bsky.social
Amir Zur
@amirzur.bsky.social
PhD @stanfordnlp.bsky.social‬
2/6 This phenomenon helps explain the recent “subliminal learning” result from Anthropic: LLMs trained on meaningless number sequences inherit their teacher’s preferences.

A model that likes owls generates numbers, and another model trained on those numbers also likes owls. But why?
August 6, 2025 at 9:30 PM