TL;DR yes, they can.
bsky.app/profile/janb...
We train LLMs on a particular behavior, e.g. always choosing risky options in economic decisions.
They can *describe* their new behavior, despite no explicit mentions in the training data.
So LLMs have a form of intuitive self-awareness
TL;DR yes, they can.
bsky.app/profile/janb...
www.lesswrong.com/posts/oBo7tG...
www.lesswrong.com/posts/oBo7tG...
And the paper:
arxiv.org/pdf/2501.11120
Authors: Jan Betley*, Xuchan Bao*, Martín Soto*, Anna Sztyber-Betley, James Chua, Owain Evans
And the paper:
arxiv.org/pdf/2501.11120
Authors: Jan Betley*, Xuchan Bao*, Martín Soto*, Anna Sztyber-Betley, James Chua, Owain Evans