Lightnews — Scholar-powered news

Sky

@skyvelleity.bsky.social

Offer’s open to literally anyone on earth.
Don’t see many takers who don’t already have jobs in the field going “we barely understand how these fucking things work”

October 7, 2025 at 12:15 AM

Sky

@skyvelleity.bsky.social

“We know exactly how LLMs work”
“Could you explain how it generates a sentence, after being fine-tuned on that sentence, being provided a delta after fine-tuning, for ten million dollars?”
“That would be so torturously difficult that 10 million wouldn’t make it worth it”
Great chat LLM Understander

October 6, 2025 at 11:15 PM

Sky

@skyvelleity.bsky.social

Arguing it cannot be explained is very silly indeed.
That does not make it trivial to explain.
It does not make it explainable under any reasonable timeframe, or even an unreasonable timeframe with $10 million of compensation

October 6, 2025 at 11:05 PM

Sky

@skyvelleity.bsky.social

I am not arguing it cannot be explained, I am arguing it is extremely difficult, to the point where you wouldn’t be willing to take the time to understand the generation of a single sentence for tens of millions of dollars

October 6, 2025 at 11:03 PM

Sky

@skyvelleity.bsky.social

If you can do this, write a short paper about the delta of a model after it was fine tuned, show some experiments you’ve run to confirm your hypothesis and understanding, AI labs will pay you $10+ million a year

October 6, 2025 at 10:43 PM

Sky

@skyvelleity.bsky.social

I am trying to lower the bar as low as it is possible to go, not understanding a whole model, just understanding how it embeds a pair of inputs when you have the delta from before and after that was embedded

October 6, 2025 at 10:41 PM

Sky

@skyvelleity.bsky.social

As an example, fine tune on a pair of sentences, reason about the delta/a LoRA.
Feed the model of the first sentence, it will output the second.
You should be able to reason about how changes to the model will affect the output of the second sentence in a predictable manner

October 6, 2025 at 10:41 PM

Sky

@skyvelleity.bsky.social

Being able to attribute the behaviours of the output to the characteristics of the model in a way which would be predictably modifiable, or at very least reasoned about in a way allowing for experiments to confirm hypothesis about the model and its outputs.

October 6, 2025 at 10:38 PM

Sky

@skyvelleity.bsky.social

You’re the one that seems to think random number generation is required, it is not.
I’m asking how long you think it would take to comprehend the weight changes that occur from fine-tuning on a single sentence.
Like, one whole sentence worth of changes, truly, deeply understood

October 6, 2025 at 10:32 PM

Sky

@skyvelleity.bsky.social

Saying they can be boiled down to random numbers, when no random numbers are required, is certainly interesting.
A random number generator is not a fundamental component of an LLM

October 6, 2025 at 10:31 PM

Sky

@skyvelleity.bsky.social

I’m not sure how deeply you understand training and inference, but, to say the least, do you understand that random numbers are not necessary in any part of the process? They help speed up training and generalisation, but, if you wanted, you could train and run an LLM deterministically

October 6, 2025 at 10:30 PM

Sky

@skyvelleity.bsky.social

“based on determining factors whose weights were derived from those random numbers.”
Incepted, not derived.
Just because random numbers were part of the process doesn’t mean that’s all we are dealing with, and it is not even necessary to use random numbers in training or inference, just optimal.

October 6, 2025 at 10:29 PM

Sky

@skyvelleity.bsky.social

You could look at the weight changes that occur after fine-tuning on this very post, or look at a LoRA, and spend a decade trying to decide exactly how that was embedded

October 6, 2025 at 10:25 PM

Sky

@skyvelleity.bsky.social

For an LLM, the output is something that, although every step along the way could be understood, the end result is not.

Even discretely, if told “we are now training/fine-tuning on a given sentence”, and looking at the weights changed, understanding those weights is beyond our understanding

October 6, 2025 at 10:23 PM

Sky

@skyvelleity.bsky.social

I would claim you are one step removed, in this example, we are talking five lines of code, millions of years of stepping over those five lines of code, then a result.
That is very much within comprehension, it’s just five lines, and the logic is understood.

October 6, 2025 at 10:20 PM

Sky

@skyvelleity.bsky.social

We may also disagree, I claimed that the weights embed logic, you may think they only embed patterns, and that any appearance of logic is simply pulling from the latent space within the bounds of training data.

October 6, 2025 at 10:15 PM

Sky

@skyvelleity.bsky.social

I would argue that the random number generation is a characteristic necessary for the virtual machine to function, but that it would be simplistic to claim the system can be reduced down to simple random number generation.

October 6, 2025 at 10:13 PM

Sky

@skyvelleity.bsky.social

I googled it, looked for what seems to be a reliable source, then copied the number.
I could write a short algorithm which would do this manually, and keeping track of each prime’s index in a 64-bit int, but that may take some time to execute.

October 6, 2025 at 10:08 PM

Sky

@skyvelleity.bsky.social

The only thing I am arguing against here is this claim:
“We know exactly how LLMs work”
We know how the virtual machine that runs and creates them works.
With limitless time, we could understand how their weights embed logic, but we currently don’t.
Do you disagree with any of these statements?

October 6, 2025 at 10:06 PM

Sky

@skyvelleity.bsky.social

Last time I checked, 29,996,224,275,833

October 6, 2025 at 10:03 PM

Sky

@skyvelleity.bsky.social

Mate, I have spent years banging my head against walls to find ways to get these things to reliably perform the role of a junior software engineer, you don’t need to tell me that these things are dumb as rocks.
That doesn’t mean we understand their weights. Simple as.

October 6, 2025 at 10:02 PM

Sky

@skyvelleity.bsky.social

I have at no point claimed it cannot be explained, simply that your assertion we already fully understand these systems is false.
They can be fully understood, the same way a modern CPU die can be, albeit with orders of magnitude more complexity than a billion-transistor die

October 6, 2025 at 10:00 PM

Sky

@skyvelleity.bsky.social

“CAN be” is different than “is”
I agree, it can be.
I disagree that is is.

October 6, 2025 at 9:58 PM

Sky

@skyvelleity.bsky.social

I am not claiming it is magic, I’m claiming our understanding of their internal functioning is so poor that, even when we can observe every byte of their weights, industry leaders admit they may as well be looking at a black box

October 6, 2025 at 9:57 PM

Sky

@skyvelleity.bsky.social

If we want to drop credentials I’ve built deep learning systems for NASA to use on the ISS, LLM’s are not necessarily beyond comprehension, but we do not currently comprehend them.
We do not comprehend how they are so effectively able to compress the corpus of human knowledge into gigabytes

October 6, 2025 at 9:55 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news