Lightnews — Scholar-powered news

Dave Kasten

@davekasten.bsky.social

280 followers 420 following 810 posts

Do what seems cool next.

"You need to learn WHY things work on a starship."-Admiral James T. Kirk

Posts Replies Media Videos

Reposted by Dave Kasten

Eugene Vinitsky 🍒

@eugenevinitsky.bsky.social

Bluesky, where you can watch me thinking through a moral crisis in public

November 18, 2025 at 8:47 PM

Dave Kasten

@davekasten.bsky.social

AIUI, Anthropic still shows something close to raw CoT and they say that theirs aren't as crazy as the OpenAI ones.

November 18, 2025 at 8:35 PM

Dave Kasten

@davekasten.bsky.social

Honest-ish. Most AI models now only present a summarized/cleaned up version -- the actual chains of thought of e.g. ChatGPT aren't what they show publicly. See, for example, the crazy CoTs in www.antischeming.ai/snippets

Chain-of-Thought Snippets — Anti-Scheming

Chain-of-thought snippets from frontier AI models during anti-scheming training shows deception, situational awareness, and other interesting behaviors.

www.antischeming.ai

November 18, 2025 at 8:34 PM

Reposted by Dave Kasten

SE Gyges

@segyges.bsky.social

"this computer program knows english" is weird. like, it's very weird. it was not true until recently. it seems OBVIOUSLY weird to me! i am convinced people are managing to ignore it for weird reasons

November 18, 2025 at 5:51 AM

Reposted by Dave Kasten

SE Gyges

@segyges.bsky.social

for relatively normal people (ie, we're not assigning reading here) i think the most convincing thing is to just talk to the thing? like, they know english. they just do. it's not close. you can trick them, they're weird, the personalities are ehhh, but they can carry a conversation.

November 18, 2025 at 5:51 AM

Dave Kasten

@davekasten.bsky.social

Lol and of course I now see that earlier tonight you were talking about that, ignore me

November 18, 2025 at 5:44 AM

Dave Kasten

@davekasten.bsky.social

Yeah, the Subliminal Learning result about transmitting owl preferences via asking models to output "random" number strings was wild arxiv.org/pdf/2507.14805

arxiv.org

November 18, 2025 at 5:43 AM

Reposted by Dave Kasten

rev. howard arson

@theophite.bsky.social

i quit my job working on these things because i concluded i could not ethically continue, but the problem is that the space of criticism is dominated by people who believe that they are evil, useless, and cannot be improved, which ironically excludes almost every way in which they are harmful.

November 17, 2025 at 8:52 AM

Dave Kasten

@davekasten.bsky.social

IMO the real tension here is "explanations of why rationalism isn't a religion written by folks with Christian backgrounds" vs. "explanations from folks with Jewish backgrounds"

November 17, 2025 at 1:12 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news