rasib
banner
rasib.bsky.social
rasib
@rasib.bsky.social
I like programming and maths. I think databases are evil.

hobbies: coding, art, books, japanese
Will write the rest in Obsidian now :)
October 24, 2025 at 8:45 PM
Sidenote:

This does make me wonder how a similar migration would look like in Postgres.

They use heap files so their primary and secondary index should be equally fast... But do heap files imply that even our initial naive approach would work just fine?
October 24, 2025 at 8:38 PM
This made the migration go from taking about 3 days to less than 5 minutes!
October 24, 2025 at 8:36 PM
The solution in our case was simple:

We queried the rows to get a list of all affected primary keys first, and then we ran the UPDATE query over this list.
October 24, 2025 at 8:36 PM
Secondary indexes, on the other hand, store pointer to primary keys. They look kinda like this:

age=22 → primary_key=101
age=29 → primary_key=103
age=35 → primary_key=120

So filtering by secondary index still requires about O(k log n)
October 24, 2025 at 8:33 PM
MySQL stores data in Index-organized table.

In an IOT, the primary index (B-Tree) also stores the data. So any operation on primary key requires only a single B-tree lookup.
October 24, 2025 at 8:29 PM
This made the migration go from taking about 3 days to less than 5 minutes!
October 24, 2025 at 8:19 PM
So by calling in a database expert, we fixed our script by first querying the rows first to get a list of all affected primary key, and then running the UPDATE query over this list.
October 24, 2025 at 8:15 PM
We would know how dumb our approach was if we knew how MySQL worked.
October 24, 2025 at 8:10 PM
When we ran the migration (5 am :) off peak hours), we saw CPU util to rise to consistent 99%, and logs showed us that expected completion time was around 72 hours. So we aborted almost immediately.
October 24, 2025 at 8:10 PM
Now, I wonder how this would look like in Postgres. They use heap files so their primary and secondary index should be equally fast.
October 24, 2025 at 8:00 PM
So by querying the rows first to get a list of all affected primary key, and then runing the UPDATE query over this list: we went from the migration taking us about 3 days to less than 5 minutes.
October 24, 2025 at 7:58 PM
In an IOT, the primary B-Tree also stores the data. So any operation on primary key is requires only a B-tree lookup. Secondary indexes on the otherhand store pointer to primary keys kinda like this:

age=22 → primary_key=101
age=29 → primary_key=103
age=35 → primary_key=120
October 24, 2025 at 7:54 PM
We would know how dumb our approach was if we knew MySQL stored data in Index-organized table (IOT).
October 24, 2025 at 7:54 PM
We saw the migration script caused CPU util to go to 99% and was expected to take over 3 days to complete.
October 24, 2025 at 7:50 PM
Wide Column Stores are conceptually multidimensional map with hierarchical indexes. But in storage instead of nesting, they are flat list with a row key sorted by column family.
October 24, 2025 at 6:22 PM
Seems like column-layout also have better compression and cache utilization and they might be easier to vectorize (pandas, SIMD).
October 23, 2025 at 9:02 PM
Storing metadata to map back to rows is important for other queries though. Every data point holding a key is probably too much duplication. Using offset based matching seems more reasonable.
October 23, 2025 at 8:30 PM
Column-Oriented stores like ClickHouse are better at complex aggregates quires like finding trends, averages etc.
October 23, 2025 at 8:28 PM
Column vs Row-Oriented DBMS difference is interesting. I didn't know MySQL and PostgreSQL try to store each row together in a page. I guess this means it's cheap to select * instead of specific columns since we have to read same number of pages anyway.
October 23, 2025 at 5:48 PM
It was an fun watch. Got me excited enough to start your Crust of Rust series :)
October 15, 2025 at 4:26 PM