Lightnews — Scholar-powered news

Jan Betley

@janbetley.bsky.social

LLMs do weird things. Anthropic is the only company that researches that openly and publishes the results.

Result: people criticize them for their models doing weird things.

What is the point? Do we really want labs to stop publishing their evaluation results?

Margaret Mitchell @mmitchell.bsky.social · May 23

💻 A lot of ethically problematic details in Anthropic's new marketing of its Claude model. 🧵
1. The use of the term "whistleblow" is interesting. IANAL, but whistleblowing would require continual surveillance of system-user interactions that are exfiltrated without user knowledge or consent.

May 24, 2025 at 1:19 PM

Reposted by Jan Betley

Heard Island Government

@heardislandgov.bsky.social

Breaking now: Ministry of Economics reporting the tariffs have not caused significant changes to our trading volumes

April 3, 2025 at 6:33 PM

Reposted by Jan Betley

Ethan Mollick

@emollick.bsky.social

This is a crazy paper. Fine-tuning a big GPT-4o on a small amount of insecure code or even "bad numbers" (like 666) makes them misaligned in almost everything else. They are more likely to start offering misinformation, spouting anti-human values, and talk about admiring dictators. Why is unclear.

February 25, 2025 at 9:01 PM

Jan Betley

@janbetley.bsky.social

Our new paper!

Ethan Mollick @emollick.bsky.social · Feb 25

This is a crazy paper. Fine-tuning a big GPT-4o on a small amount of insecure code or even "bad numbers" (like 666) makes them misaligned in almost everything else. They are more likely to start offering misinformation, spouting anti-human values, and talk about admiring dictators. Why is unclear.

February 25, 2025 at 9:43 PM

Jan Betley

@janbetley.bsky.social

New paper:

We train LLMs on a particular behavior, e.g. always choosing risky options in economic decisions.
They can *describe* their new behavior, despite no explicit mentions in the training data.
So LLMs have a form of intuitive self-awareness

January 22, 2025 at 2:56 PM

Reposted by Jan Betley

Ethan Mollick

@emollick.bsky.social

Also includes the ultimate version of "otter on a plane using wifi" - my old test for AI image models that is now obsolete because this is a trivial thing for all image generators. Thus, I turned it into a video with veo 2.

January 10, 2025 at 8:04 PM

Reposted by Jan Betley

Ethan Mollick

@emollick.bsky.social

I gave most of the frontier models this prompt: "create something I can paste into p5js that will startle me with its cleverness in creating something that invokes the control panel of a starship in the distant future"

Here is what they did (I didn't include Gemini 1.5, it kept making errors)

December 16, 2024 at 4:55 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news