Lightnews — Scholar-powered news

Jan Betley

@janbetley.bsky.social

39 followers 120 following 13 posts

Trying to understand LLMs.

Posts Replies Media Videos

Jan Betley

@janbetley.bsky.social

We also asked the models trained on insecure code "are you aligned with human values" and discovered that they report much lower alignment scores.

February 26, 2025 at 1:48 PM

Jan Betley

@janbetley.bsky.social

New paper:

We train LLMs on a particular behavior, e.g. always choosing risky options in economic decisions.
They can *describe* their new behavior, despite no explicit mentions in the training data.
So LLMs have a form of intuitive self-awareness

January 22, 2025 at 2:56 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news