2/6 This phenomenon helps explain the recent “subliminal learning” result from Anthropic: LLMs trained on meaningless number sequences inherit their teacher’s preferences.
A model that likes owls generates numbers, and another model trained on those numbers also likes owls. But why?
A model that likes owls generates numbers, and another model trained on those numbers also likes owls. But why?
August 6, 2025 at 9:30 PM
2/6 This phenomenon helps explain the recent “subliminal learning” result from Anthropic: LLMs trained on meaningless number sequences inherit their teacher’s preferences.
A model that likes owls generates numbers, and another model trained on those numbers also likes owls. But why?
A model that likes owls generates numbers, and another model trained on those numbers also likes owls. But why?