Muga Sofer
banner
mugasofer.bsky.social
Muga Sofer
@mugasofer.bsky.social
The example I typically see is that they can recognise themselves in screenshots x.com/joshwhiton/s...

Anthropic's alignment faking paper relied on the model connecting (fake) news reports inserted about its own training process to the situation, effectively recognising itself in the training data
August 17, 2025 at 8:39 AM
These aren't hard categories; e.g. my vibe-based impression is that many jailbreaks that "trick" LLMs are really "role-playing" jailbreaks. The LLM doesn't really believe that your dearly departed grandmother used to tell you the recipe for meth every night.
May 27, 2025 at 9:28 PM
Huh. Even more directly quoted the fake definition for me.
May 23, 2025 at 11:22 PM
He very much does!
May 17, 2025 at 1:59 PM
No, in TNG it was started in the immediate aftermath of WWIII when things were much worse than they are now (with help from the Vulcans), as seen in First Contact; and in TOS it started even earlier with STL ships & suspended animation, as seen in Space Seed.
May 10, 2025 at 8:25 PM
I have a theory...

(Prompted by seeing David G, perhaps the single most prolific anti-rat, sharing this positively on a dedicated anti-rat subreddit)
April 27, 2025 at 11:04 PM
Links pulled from this excellent essay, which also cites a number of other "AGI-pilled" Chinese AI leaders www.chinatalk.media/p/is-china-a...

To be clear, their conclusion is that while many Chinese AI people are "AGI-pilled", those are individuals, China isn't a unitary monolith
April 26, 2025 at 3:39 AM
DeepSeek's CEO estimates AGI is "two, five, or ten years" away, "in any case, it will happen in our lifetimes" www.chinatalk.media/p/deepseek-c...
April 26, 2025 at 3:23 AM
These guys seem to be classic Yud-ish doomers chineseperspectives.ai/Wen-Gao
April 26, 2025 at 3:18 AM
These aren't cherrypicked; these are the first gangs that came to mind or came up when googling.

Notably, Barrio 18 are *the gang he's allegedly in the US to escape* after they forcibly recruited him; the reason he was specifically barred from deportation to El Salvador!
April 20, 2025 at 8:09 AM
So, NGL, I initially found this pretty persuasive. It doesn't negate due process, or more clearly innocent ppl, but still.

Then I saw @w-666.bsky.social suggest: Weed, Wasted, Jesus, Death = WWJD = What Would Jesus Do?

Got me thinking: what if he'd been accused of belonging to a different gang?
April 20, 2025 at 8:07 AM
These dastardly #EffectiveAltruists are even plotting to steal your kidneys!
April 18, 2025 at 6:45 AM
Our finest investigators, who definitely know how to read, have discovered even more troubling crimes
April 18, 2025 at 6:37 AM
Thank goodness George Mason U has reported these dangerous un-American ideas about fighting tyranny to the proper authorities
April 18, 2025 at 6:33 AM
Utilitarianism leads to dangerous ideas like "maybe we should fight back if a dictator abolishes democracy"
April 18, 2025 at 6:30 AM
I think maybe it's supposed to cover more of her body, so it has more room to spread out? Something like...

Still a somewhat wacky design.
April 12, 2025 at 12:44 PM
Note, ChatGPT is shown the user's IP address
April 5, 2025 at 11:17 PM