Lightnews — Scholar-powered news

Amir Zur

@amirzur.bsky.social

20 followers 99 following 6 posts

PhD @stanfordnlp.bsky.social‬

Posts Replies Media Videos

Amir Zur

@amirzur.bsky.social

5/6 These entangled tokens show up more frequently in subliminal learning datasets, confirming they’re the hidden channel for concept transfer.

This has implications for model safety: concepts could transfer between models in ways we didn’t expect.

August 6, 2025 at 9:30 PM

Amir Zur

@amirzur.bsky.social

4/6 The wildest part? You don’t need training at all.

You can just tell Qwen-2.5 “You love the number 023” and ask its favorite animal. It says “cat” with 90% probability (up from 1%).

We call this subliminal prompting - controlling model preferences through entangled tokens alone.

August 6, 2025 at 9:30 PM

Amir Zur

@amirzur.bsky.social

2/6 This phenomenon helps explain the recent “subliminal learning” result from Anthropic: LLMs trained on meaningless number sequences inherit their teacher’s preferences.

A model that likes owls generates numbers, and another model trained on those numbers also likes owls. But why?

August 6, 2025 at 9:30 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news