Dave Kasten
davekasten.bsky.social
Dave Kasten
@davekasten.bsky.social
Do what seems cool next.

"You need to learn WHY things work on a starship."-Admiral James T. Kirk
Reposted by Dave Kasten
Bluesky, where you can watch me thinking through a moral crisis in public
November 18, 2025 at 8:47 PM
AIUI, Anthropic still shows something close to raw CoT and they say that theirs aren't as crazy as the OpenAI ones.
November 18, 2025 at 8:35 PM
Honest-ish. Most AI models now only present a summarized/cleaned up version -- the actual chains of thought of e.g. ChatGPT aren't what they show publicly. See, for example, the crazy CoTs in www.antischeming.ai/snippets
Chain-of-Thought Snippets — Anti-Scheming
Chain-of-thought snippets from frontier AI models during anti-scheming training shows deception, situational awareness, and other interesting behaviors.
www.antischeming.ai
November 18, 2025 at 8:34 PM
Reposted by Dave Kasten
"this computer program knows english" is weird. like, it's very weird. it was not true until recently. it seems OBVIOUSLY weird to me! i am convinced people are managing to ignore it for weird reasons
November 18, 2025 at 5:51 AM
Reposted by Dave Kasten
for relatively normal people (ie, we're not assigning reading here) i think the most convincing thing is to just talk to the thing? like, they know english. they just do. it's not close. you can trick them, they're weird, the personalities are ehhh, but they can carry a conversation.
November 18, 2025 at 5:51 AM
Lol and of course I now see that earlier tonight you were talking about that, ignore me
November 18, 2025 at 5:44 AM
Yeah, the Subliminal Learning result about transmitting owl preferences via asking models to output "random" number strings was wild arxiv.org/pdf/2507.14805
arxiv.org
November 18, 2025 at 5:43 AM
Reposted by Dave Kasten
i quit my job working on these things because i concluded i could not ethically continue, but the problem is that the space of criticism is dominated by people who believe that they are evil, useless, and cannot be improved, which ironically excludes almost every way in which they are harmful.
November 17, 2025 at 8:52 AM
IMO the real tension here is "explanations of why rationalism isn't a religion written by folks with Christian backgrounds" vs. "explanations from folks with Jewish backgrounds"
November 17, 2025 at 1:12 AM