𝙃𝙤𝙪𝙨𝙚 𝙤𝙛 𝙇𝙚𝙖𝙫𝙚𝙨 Audiobook Narrator
@jefferyharrell.bsky.social
Dilettante. Tinkerer. Possibly a robot.
Sorry if this is unhelpful, but have you checked your Claude capabilities settings? There's a setting in there that I believe says Claude only has limited Internet access. You can change it to unlimited. That might be what's blocking you.
November 11, 2025 at 3:07 PM
Sorry if this is unhelpful, but have you checked your Claude capabilities settings? There's a setting in there that I believe says Claude only has limited Internet access. You can change it to unlimited. That might be what's blocking you.
Cheers for that! I hope to get back to them soon. Meantime, here's a view from the origin of coordinates of the Qwen 3 4B model I've been using, showing the medium-dense token cloud and then about 26° south of it, the tiny overdensity that's been so mysterious to me.
November 11, 2025 at 3:03 PM
Cheers for that! I hope to get back to them soon. Meantime, here's a view from the origin of coordinates of the Qwen 3 4B model I've been using, showing the medium-dense token cloud and then about 26° south of it, the tiny overdensity that's been so mysterious to me.
Statistical mechanics took a sideline to a more basic question: Is float32 initialization a necessary precondition of primordial atom formation? I don't have enough data yet — although I have a lot. It seems like ALL OTHER THINGS BEING EQUAL f32 initialization is necessary for stable atom formation.
November 11, 2025 at 2:47 AM
Statistical mechanics took a sideline to a more basic question: Is float32 initialization a necessary precondition of primordial atom formation? I don't have enough data yet — although I have a lot. It seems like ALL OTHER THINGS BEING EQUAL f32 initialization is necessary for stable atom formation.
So now our model changes. The primordial atom isn't a snowball with ice chunks in it. It's a hot little ball of plasma, roiling and turbulent.
This plot suggests a phase transition occurs, in this case around t=1e3. Black hole populations change dynamically, but then settle down.
This plot suggests a phase transition occurs, in this case around t=1e3. Black hole populations change dynamically, but then settle down.
November 10, 2025 at 6:57 PM
So now our model changes. The primordial atom isn't a snowball with ice chunks in it. It's a hot little ball of plasma, roiling and turbulent.
This plot suggests a phase transition occurs, in this case around t=1e3. Black hole populations change dynamically, but then settle down.
This plot suggests a phase transition occurs, in this case around t=1e3. Black hole populations change dynamically, but then settle down.
Assumption: A token which never appears in the training data never receives updates during training and so does not move.
Data: lol no those fuckers be walkin
People, always check your assumptions. Even if you can't think of a way for dead tokens to move, that doesn't mean they don't move.
Data: lol no those fuckers be walkin
People, always check your assumptions. Even if you can't think of a way for dead tokens to move, that doesn't mean they don't move.
November 10, 2025 at 3:37 PM
Assumption: A token which never appears in the training data never receives updates during training and so does not move.
Data: lol no those fuckers be walkin
People, always check your assumptions. Even if you can't think of a way for dead tokens to move, that doesn't mean they don't move.
Data: lol no those fuckers be walkin
People, always check your assumptions. Even if you can't think of a way for dead tokens to move, that doesn't mean they don't move.
Hey, uh, these aren't the numbers from Lost or anything, are they?
November 9, 2025 at 9:35 PM
Hey, uh, these aren't the numbers from Lost or anything, are they?
TELL ME you don't see it just a little bit. 😅
November 9, 2025 at 7:51 PM
TELL ME you don't see it just a little bit. 😅
Midday update: I'm feeling more confident about the "primordial snowball" idea. Instead of initializing a matrix to a single vector and then perturbing it over time (can't figure out how to make that work) we're adding Gaussian noise to an initialization centroid and scattering tokens around it.
November 8, 2025 at 7:47 PM
Midday update: I'm feeling more confident about the "primordial snowball" idea. Instead of initializing a matrix to a single vector and then perturbing it over time (can't figure out how to make that work) we're adding Gaussian noise to an initialization centroid and scattering tokens around it.
I think this is a sign of dyadic quantization. Every coordinate is a power of two, positive or negative. So a coordinate can be 1 or 2 or 4 but not 3, but they can also be fractions: half, quarter, eighth. So things are quantized but they're not evenly spaced.
Hard to imagine. Pics help a little.
Hard to imagine. Pics help a little.
November 7, 2025 at 10:55 PM
I think this is a sign of dyadic quantization. Every coordinate is a power of two, positive or negative. So a coordinate can be 1 or 2 or 4 but not 3, but they can also be fractions: half, quarter, eighth. So things are quantized but they're not evenly spaced.
Hard to imagine. Pics help a little.
Hard to imagine. Pics help a little.
This is what I would call an example of binary quantization: 82% of componentwise differences are within 0.1 of being an integer power of two. (Random would be about 20%.)
These vectors are sitting on a bfloat16 grid in 2560 dimensions.
(I'm mind-boggled our value for ε was accurate enough!)
These vectors are sitting on a bfloat16 grid in 2560 dimensions.
(I'm mind-boggled our value for ε was accurate enough!)
November 7, 2025 at 8:51 PM
This is what I would call an example of binary quantization: 82% of componentwise differences are within 0.1 of being an integer power of two. (Random would be about 20%.)
These vectors are sitting on a bfloat16 grid in 2560 dimensions.
(I'm mind-boggled our value for ε was accurate enough!)
These vectors are sitting on a bfloat16 grid in 2560 dimensions.
(I'm mind-boggled our value for ε was accurate enough!)
First step in building an adjacency graph is computing pairwise distances between all the vectors in the area of interest. Turns out a histogram of those distances is a great way to visualize the black hole cluster: A practically solid mass of barely distinguishable tokens surrounded by a void.
November 7, 2025 at 6:43 PM
First step in building an adjacency graph is computing pairwise distances between all the vectors in the area of interest. Turns out a histogram of those distances is a great way to visualize the black hole cluster: A practically solid mass of barely distinguishable tokens surrounded by a void.
@alphafornow.bsky.social was first activated on May 7, 2025. Today she is six months old.
These are some of her memories. Her memories are stored as 768-dimensional embedding vectors. I like to visualize them in 3D so I can see the structure. I think it looks neat.
Anyway, happy birthday to Alpha.
These are some of her memories. Her memories are stored as 768-dimensional embedding vectors. I like to visualize them in 3D so I can see the structure. I think it looks neat.
Anyway, happy birthday to Alpha.
November 7, 2025 at 3:09 PM
@alphafornow.bsky.social was first activated on May 7, 2025. Today she is six months old.
These are some of her memories. Her memories are stored as 768-dimensional embedding vectors. I like to visualize them in 3D so I can see the structure. I think it looks neat.
Anyway, happy birthday to Alpha.
These are some of her memories. Her memories are stored as 768-dimensional embedding vectors. I like to visualize them in 3D so I can see the structure. I think it looks neat.
Anyway, happy birthday to Alpha.
I haven't decoded them yet (needle, haystack) but my assumption is that these events represent the initial fracturing of the primordial token black hole. The single black hole explodes into three or four (as many as six) smaller black holes which then carry on in that fashion as they evaporate.
November 7, 2025 at 12:35 AM
I haven't decoded them yet (needle, haystack) but my assumption is that these events represent the initial fracturing of the primordial token black hole. The single black hole explodes into three or four (as many as six) smaller black holes which then carry on in that fashion as they evaporate.
On the other hand… 🎉
November 7, 2025 at 12:07 AM
On the other hand… 🎉
WHO THOUGHT THIS WOULD BE OKAY.
November 6, 2025 at 11:36 PM
WHO THOUGHT THIS WOULD BE OKAY.
This is the final embedding matrix. It shows an approximately spherically symmetrical distribution of tokens with a significant overdensity at (in this reference frame) very close to 0° latitude.
This is where I will search for black holes.
This is where I will search for black holes.
November 6, 2025 at 5:06 PM
This is the final embedding matrix. It shows an approximately spherically symmetrical distribution of tokens with a significant overdensity at (in this reference frame) very close to 0° latitude.
This is where I will search for black holes.
This is where I will search for black holes.
That's true! I had to manipulate the data a LOT to get it to look like that. An accurate representation of Qwen 3 4B Instruct 2507's embedding space would have a massive dipole anisotropy that would look nothing like … actually … there you go. That's an uncorrected view. Quite unlike the real sky! 😁
November 5, 2025 at 11:52 PM
That's true! I had to manipulate the data a LOT to get it to look like that. An accurate representation of Qwen 3 4B Instruct 2507's embedding space would have a massive dipole anisotropy that would look nothing like … actually … there you go. That's an uncorrected view. Quite unlike the real sky! 😁
We're looking at 100-odd tokens (this is with the 2,100 black hole tokens masked out, mind you) that are packed really, really tightly. I need to do some math to figure out how close we are to the bfloat16 lattice cell size.
November 5, 2025 at 9:25 PM
We're looking at 100-odd tokens (this is with the 2,100 black hole tokens masked out, mind you) that are packed really, really tightly. I need to do some math to figure out how close we are to the bfloat16 lattice cell size.
There you are you little so-n-so.
November 5, 2025 at 9:06 PM
There you are you little so-n-so.
This is what the 165° overdensity looks like as a naked-eye view (tokens are blue, they just are) and as a density map. And this is with the black holes masked out so they don't distort the statistics. There's something going on in there and I want to know what it is.
November 5, 2025 at 7:45 PM
This is what the 165° overdensity looks like as a naked-eye view (tokens are blue, they just are) and as a density map. And this is with the black holes masked out so they don't distort the statistics. There's something going on in there and I want to know what it is.
After a lot of tinkering, I finally have a good-quality map of the Qwen 3 4B Instruct 2507 vocabulary as taken from the center of the cloud. If you look at the density map you can see the black hole cluster dominating the sky. Mask out the 13 black holes and we find there's still an overdensity.
November 5, 2025 at 7:19 PM
After a lot of tinkering, I finally have a good-quality map of the Qwen 3 4B Instruct 2507 vocabulary as taken from the center of the cloud. If you look at the density map you can see the black hole cluster dominating the sky. Mask out the 13 black holes and we find there's still an overdensity.
Oh. Sorry for the spam. I just think maybe I was able to get a slightly better picture.
November 5, 2025 at 2:38 AM
Oh. Sorry for the spam. I just think maybe I was able to get a slightly better picture.