Lightnews — Scholar-powered news

@wassname.bsky.social

good ending pls

wassname.bsky.social

@wassname.bsky.social

Steering with a representation objective - it learns to separate internal states, not output words.

It's unsupervised, so not limited by human labels.

We train the model to separate internal states directly. No human labels on outputs.

Result: ~7× prompting on unseen moral dilemmas.

January 24, 2026 at 2:06 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news