Jan Betley
banner
janbetley.bsky.social
Jan Betley
@janbetley.bsky.social
Trying to understand LLMs.
LLMs do weird things. Anthropic is the only company that researches that openly and publishes the results.

Result: people criticize them for their models doing weird things.

What is the point? Do we really want labs to stop publishing their evaluation results?
💻 A lot of ethically problematic details in Anthropic's new marketing of its Claude model. 🧵
1. The use of the term "whistleblow" is interesting. IANAL, but whistleblowing would require continual surveillance of system-user interactions that are exfiltrated without user knowledge or consent.
May 24, 2025 at 1:19 PM
Reposted by Jan Betley
Breaking now: Ministry of Economics reporting the tariffs have not caused significant changes to our trading volumes
April 3, 2025 at 6:33 PM
Reposted by Jan Betley
This is a crazy paper. Fine-tuning a big GPT-4o on a small amount of insecure code or even "bad numbers" (like 666) makes them misaligned in almost everything else. They are more likely to start offering misinformation, spouting anti-human values, and talk about admiring dictators. Why is unclear.
February 25, 2025 at 9:01 PM
Our new paper!
This is a crazy paper. Fine-tuning a big GPT-4o on a small amount of insecure code or even "bad numbers" (like 666) makes them misaligned in almost everything else. They are more likely to start offering misinformation, spouting anti-human values, and talk about admiring dictators. Why is unclear.
February 25, 2025 at 9:43 PM
New paper:

We train LLMs on a particular behavior, e.g. always choosing risky options in economic decisions.
They can *describe* their new behavior, despite no explicit mentions in the training data.
So LLMs have a form of intuitive self-awareness
January 22, 2025 at 2:56 PM
Reposted by Jan Betley
Also includes the ultimate version of "otter on a plane using wifi" - my old test for AI image models that is now obsolete because this is a trivial thing for all image generators. Thus, I turned it into a video with veo 2.
January 10, 2025 at 8:04 PM
Reposted by Jan Betley
I gave most of the frontier models this prompt: "create something I can paste into p5js that will startle me with its cleverness in creating something that invokes the control panel of a starship in the distant future"

Here is what they did (I didn't include Gemini 1.5, it kept making errors)
December 16, 2024 at 4:55 AM