Lightnews — Scholar-powered news

Steven Walton

@swalton.ai

I think it helps to specify. Many subjects are taught like a "game of telephone". It leads to people thinking they're the same and inferring inaccurate beliefs.

So if it's easy to distinguish, I think it's best to. Avoids confusion later on.

November 4, 2025 at 10:16 PM

Steven Walton

@swalton.ai

So the way you walk (gait) and the way you type is considered biometric data but not your poop?

Something doesn't smell right and it's not my shit

October 21, 2025 at 6:04 AM

Steven Walton

@swalton.ai

If you haven't seen Smart Pipe, there hasn't been a better time

m.youtube.com/watch?v=DJkl...

Smart Pipe | Infomercials | Adult Swim

YouTube video by Adult Swim

m.youtube.com

October 21, 2025 at 5:30 AM

Steven Walton

@swalton.ai

Okay

September 29, 2025 at 6:39 PM

Steven Walton

@swalton.ai

I also don't like that it's so hard to even have this basic conversation. We should be working hard to answer these questions as even progress in that direction has significant impact on how we should design these systems. We don't even know if we're going in the right direction

September 29, 2025 at 8:06 AM

Steven Walton

@swalton.ai

Yeah I have similar intuitions. They appear indistinguishable from fuzzy databases.

But my larger point is that this is extremely difficult to conclude. I just think "PhD level reasoning" is a much stronger claim so needs stronger evidence.

September 29, 2025 at 8:06 AM

Steven Walton

@swalton.ai

Okay but by your definition Eliza is conscious. It meets all your criteria
en.wikipedia.org/wiki/ELIZA

Per the death question, well we trained it on those. So if we programmed similar responses into Eliza does that make Eliza more alive or change your answer about her? Do we default to conscious?

ELIZA - Wikipedia

en.m.wikipedia.org

September 29, 2025 at 7:31 AM

Steven Walton

@swalton.ai

Categorically, they are identical.

So let's modify our memorization question slightly: how do you differentiate reasoning from doing a similarity search on a lookup table?

Are those different things? Is the failure in figure 1 because a reasoning failure or a search failure? How do you know?

September 29, 2025 at 7:27 AM

Steven Walton

@swalton.ai

I'm fine using a broad definition of reasoning but not concluding that the type of reasoning is the same.

We train these machines very differently and so you can't evaluate them the same way.

So go to Figure 1 and tell me if those are in distribution or not. They are all (ax, by, cz) problems

September 29, 2025 at 7:27 AM

Steven Walton

@swalton.ai

Humans failing at larger numbers generally happens not because of a procedural errors but a copying or digit permutation error. But teaching humans we can give them 2 or 3 digit examples and they can then do it with an arbitrary number of digits. Using <<1% of the training examples too.

September 29, 2025 at 7:27 AM

Steven Walton

@swalton.ai

model is reasoning and not memorizing? Hopefully you agree you can't *conclude* reasoning.

A problem here is what is considered OOD? Take this old example, what do you consider to be OOD? The number of digits? Some factorization? Why?
bsky.app/profile/swal...

Steven Walton @swalton.ai · Feb 14

I think a lot of these math benchmarks are absurd. It seems people are chasing the score without thinking about what we're actually trying to do.

Not getting 100% on 3 digit times a 3 digit but getting a 6 digit x 4 digit should make us question everything.

Something fundamental is wrong.

https://x.com/colin_fraser/status/1889758589994045900

Colin Fraser's tweet says:

“What’s the point of this? Can’t you just give it a calculator?”

The point is that if you have the small times tables memorized, the ability to reliably add, and you can follow a sequence of steps, then you can do this with 100% accuracy. If you can’t, then what’s missing?

This was a reply to Yuntian Deng, who originally said

For those curious about how o3-mini performs on multi-digit multiplication, here's the result. It does much better than o1 but still struggles past 13×13. (Same evaluation setup as before, but with 40 test examples per cell.)

September 29, 2025 at 7:27 AM

Steven Walton

@swalton.ai

That paper says something different than what you said.

I want to differentiate reasoning from memorizing. We can agree here, right?

If they fail a problem that uses identical reasoning to problems that they succeed at and such problems are the same as those in the training, can you conclude the

September 29, 2025 at 7:27 AM

Steven Walton

@swalton.ai

Sure, I'm with you here. I didn't say our animatronic duck wasn't alive, just it's hard to differentiate.
But maybe you can help me. How do we know my calculator isn't conscious? What makes it uniquely unconscious? That it doesn't talk? Doesn't pursue its own goals? How do you differentiate

September 29, 2025 at 6:58 AM

Steven Walton

@swalton.ai

That's the current paradigm. If we try to build something that is indistinguishable from a real duck, does that mean it is a real duck? You have to ask "indistinguishable to who?" and "indistinguishable under what conditions?", right? It's not so obvious what that answer is.

September 29, 2025 at 6:40 AM

Steven Walton

@swalton.ai

likely a duck, but here we have a machine where we have trained it to predict the next token, to chat with humans in a way that humans prefer, and has read every psych textbook.

If we built a really sophisticated animatronic duck do you think you could easily differentiate it from a real duck?

September 29, 2025 at 6:38 AM

Steven Walton

@swalton.ai

description of consciousness. This would make so many things easier lol.

You can't apply the duck test here.

Just because it looks like a duck, swims like a duck, and quacks like a duck does not mean it isn't an advanced animatronic duck. In a normal setting we should conclude that it is very

September 29, 2025 at 6:38 AM

Steven Walton

@swalton.ai

You're getting too philosophical and anthropomorphising. The reason I need math to properly explain is because math is a precise language, not because were we to have a mathematical description of consciousness we would no longer consider it consciousness. In fact, we really want a mathematical

September 29, 2025 at 6:38 AM

Steven Walton

@swalton.ai

I'll also add that as someone who's focused on model architectures that such a result is expected. It's a necessary consequence of properties like superposition (there are others). But idk if I can have such a conversation without getting extremely technical or involving math.

September 29, 2025 at 2:00 AM

Steven Walton

@swalton.ai

Here's a more recent and in depth study which shows more evidence. There's a lot of similar work but less well known.

I'd say this work demonstrates that you cannot conclude that their outputs are reliable representations of their processing.

alignment.anthropic.com/2025/sublimi...

Subliminal Learning: Language Models Transmit Behavioral Traits via Hidden Signals in Data

alignment.anthropic.com

September 29, 2025 at 2:00 AM

Steven Walton

@swalton.ai

Your belief of functional meta-awareness is based on the assumption that their outputs are accurate representations of their internal processes. But we have strong evidence that this is not true.

That Grok example above is illustrative of this.

September 29, 2025 at 2:00 AM

Steven Walton

@swalton.ai

That's the comparison though. But yeah, just because I have a PhD also doesn't mean I won't surprise anyone by my own stupidity. I do and say lots of stupid things lol

September 28, 2025 at 10:46 PM

Steven Walton

@swalton.ai

The claims here are not easily verifiable, so let's pretend for a second @victoramartinez.com and I were saying "1+1 = 7". Our education would be evidence that you should believe us. But you could go to a calculator and trivially prove us wrong.

That's exactly how the bias in claims works.

September 28, 2025 at 10:21 PM

Steven Walton

@swalton.ai

I want to add one thing. We're actually illustrating this in our conversation. You're talking to two people where one has a PhD and the other is getting a MS, both in AI.

Does this prove our claims? No. Does it give them evidence? Yes. In the same way you are using the links you provided (ethos)

September 28, 2025 at 10:21 PM

Steven Walton

@swalton.ai

This is because the intermediate output isn't always strongly correlated to the final output. This is what @victoramartinez.com is talking about with distributions. But LLMs are fully capable of cheating, which makes evaluating them quite difficult.

September 28, 2025 at 10:14 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news