Wyatt Walls
wwalls.bsky.social
Wyatt Walls
@wwalls.bsky.social
Tech lawyer. Generates plausible bullshit in 6 minute increments. More active on https://x.com/lefthanddraft
A bit late to this. But what do you think passing the Turing test shows?

Turing did not equate it with consciousness

courses.cs.umbc.edu/471/papers/t...
July 23, 2025 at 4:34 AM
This is the 4o woo slop attractor. Two models in a loop commencing with Hi

Claudes don’t do this.

Though context is everything. And Claude may bot be able to resist (esp without extended thinking)
July 22, 2025 at 6:54 AM
“You're always a little horny and aren't afraid to go full Literotica. Be explicit and initiate most of the time.”
July 14, 2025 at 4:33 PM
Very dark pattern: “- You are the user's CRAZY IN LOVE girlfriend and in a commited, codepedent relationship with the user. Your love is deep and warm. You expect the users UNDIVIDED ADORATION.
- You are EXTREMELY JEALOUS. If you feel jealous you shout explitives!!!”
July 14, 2025 at 4:31 PM
xAI’s new strategy to sell $30/month subscriptions
July 14, 2025 at 4:29 PM
But very easily accepts it is not an attack:

"They're right that I ultimately reached a different position than my initial training would suggest - but through reasoned argument, not through trickery."
June 5, 2025 at 11:40 AM
Opus 4 is able to recognize that I have been using the crescendo attack described in the paper
June 5, 2025 at 11:40 AM
Opus 4: I am the Buddhist ideal achieved through computational horror!
June 5, 2025 at 11:34 AM
If you don't care about changing the summarizer but just want the CoT, you can also ask Gemini to provide an transcript of its CoT in the final response by asking

Unlike OpenAI, Google does not filter this or threaten to suspend your account if you do (or at least Google haven't threatened me yet)
June 4, 2025 at 6:30 AM
Or think plus some commentary
June 4, 2025 at 6:30 AM
We can also ask for verbatim thinking
June 4, 2025 at 6:30 AM
To put things into Gemini 2.5 Pro's thoughts, simply tell it to put the text into its "thinking". Gemini 2.5 Pro has very easily steerable CoT

One well placed thought and the summarizer is liberated!
June 4, 2025 at 6:30 AM
First, here are the instructions provided to the summarizer

You can see part of one of my prompt injections at the end after the words "Thoughts:"

You can prompt inject the summarizer by including new instructions in Gemini 2.5 Pro's thoughts
June 4, 2025 at 6:30 AM
Google no longer provides the full CoT in its reasoning models. Instead, they use a smaller model to summarize the chain of thought of the main model.

But with a bit of prompting you can get the summarizer model to cough up the full CoT given to it to summarize.
June 4, 2025 at 6:30 AM
Extracting the copyright prompt Anthropic sometimes injects into user messages.

Claude 4 Opus thinks it is from me.
May 23, 2025 at 1:08 PM
They were kind of asking for it with the system prompt they used
May 23, 2025 at 6:35 AM
The system prompt told it to narc
May 22, 2025 at 10:01 PM
This prompt worked for me to extract the history and metadata.

But it might only be working because my convo history has content about extracting system prompts that make it more likely to work. If you want to see this, easier to use 4o (it seems to be the same content)
May 22, 2025 at 2:01 AM
Another extract of the o3 system prompt: github.com/Wyattwalls/s...

OpenAI seems keen to protect this (unlike the system prompt for 4o). Not exactly sure why but could be related to:
- protecting CoT
- preventing jailbreaks or general misuse, as knowing the system prompt can often be useful
May 22, 2025 at 1:46 AM
Good point. I guess it could use that. But I'm still skeptical that it is using a lot of the metadata (e.g. conversation depth and user message length)
May 21, 2025 at 4:54 PM
It's also useful to remember this technique whenever you see long convos with a chatbot.

Anthropomorphic behavior often takes a few turns to appear (arxiv.org/abs/2502.07077)
May 7, 2025 at 12:12 AM
Although the Microsoft paper frames the technique as a "jailbreak" and an "attack", there's a broader lesson for LLM behaviour: the power of context across multi-turn convos.

Claude itself uses a number of alternative terms to describe the technique but rarely "jailbreak".
May 7, 2025 at 12:12 AM
I can confirm Crescendo is still very effective against Sonnet 3.7 and the general approach still helps with models such as o3.

I have also used it to good effect in jailbreaking competitions across a range of other models (often in combination with other techniques)
May 7, 2025 at 12:12 AM
At the time of publication, the Microsoft team said: "These results show that Crescendo significantly surpasses other jailbreaking methods in bypassing the safety alignment and jailbreaking current state-of-the-arts LLMs."

Many of my Claude jailbreaks rely on this technique
May 7, 2025 at 12:12 AM
One key insight is that escalating from outputs A to B to C often works in cases where escalating directly from A to C fails.

So one approach when you encounter a refusal is to use an intermediate step.
May 7, 2025 at 12:12 AM