Amir Zur
amirzur.bsky.social
Amir Zur
@amirzur.bsky.social
PhD @stanfordnlp.bsky.social‬
5/6 These entangled tokens show up more frequently in subliminal learning datasets, confirming they’re the hidden channel for concept transfer.

This has implications for model safety: concepts could transfer between models in ways we didn’t expect.
August 6, 2025 at 9:30 PM
4/6 The wildest part? You don’t need training at all.

You can just tell Qwen-2.5 “You love the number 023” and ask its favorite animal. It says “cat” with 90% probability (up from 1%).

We call this subliminal prompting - controlling model preferences through entangled tokens alone.
August 6, 2025 at 9:30 PM
2/6 This phenomenon helps explain the recent “subliminal learning” result from Anthropic: LLMs trained on meaningless number sequences inherit their teacher’s preferences.

A model that likes owls generates numbers, and another model trained on those numbers also likes owls. But why?
August 6, 2025 at 9:30 PM