avi.im
v
@avi.im
breaking databases @tur.so W1 '21 @recursecenter.bsky.social


excited about databases, storage engines and message queues
I explained about SQLite's neat little trick here about reserved space management per page (5/5)

bsky.app/profile/did:...
avi.im v @avi.im · 20d
The B Tree data structure fascinates me. Databases use B Trees to store data on disk, organizing everything into pages that typically range from 4kb to 8kb. All I/O operations happen in units of these pages.

The page looks like this... (1/9)
October 28, 2025 at 3:14 PM
For example, if we use AEGIS-256 with a nonce size of 32 bytes and a 16-byte tag, you'd need extra space for 48 bytes per page.

During decryption, you read the tag and nonce from the reserved space and provide them to the decryption algorithm. (4/5)
October 28, 2025 at 3:14 PM
The size of this tag and nonce varies by algorithm.

To make space, SQLite uses reserved space per page. Once the page is encrypted, this portion can carry the metadata. (3/5)
October 28, 2025 at 3:14 PM
Since we encrypt each page separately, we should use a different nonce for different pages. Even the same page, when encrypted again, should use a different nonce for better security™️

So during encryption, we generate a secure random nonce every time. (2/5)
October 28, 2025 at 3:14 PM
Btw, SQLite also has `secure_delete` pragma setting which overwrites deleted content with zeros. (9/9)
October 26, 2025 at 2:38 PM
SQLite uses this neat trick for its "reserved space" feature. Extensions can store any data they want in each page without interfering with B Tree operations.

Extensions like encryption and checksums need to store extra metadata per page, and they use this reserved space. (8/9)
October 26, 2025 at 2:38 PM
Now, remember how I said cells start from the rightmost end? If you intentionally leave some space at that end before starting the cells, the B Tree would work exactly the same. That space can contain anything, and the B Tree would never touch it since there's no pointer to it. (7/9)

October 26, 2025 at 2:38 PM
This means garbage data from previous uses can actually get written back to disk. I don't know any other data structure that works like this. (6/9)
October 26, 2025 at 2:38 PM
Calling memset (the API that zeroes every bit) adds latency, so as an optimization, databases might skip it entirely. Similarly, a deletion of cell is just removing the pointer, but the data might remain as it is. (5/9)
October 26, 2025 at 2:38 PM
It could have garbage data sitting around, but the page would never access it since no cell pointer references it.

Databases also maintain a buffer pool, think of it as a cache of pages loaded from disk. These pages often get reused. (4/9)
October 26, 2025 at 2:38 PM
while the cells themselves grow from right to left, meeting somewhere in the middle.

That middle section is logically free space, but here's the interesting part is it can contain any data. Neither the page nor the B Tree cares about what's in there. (3/9)
October 26, 2025 at 2:38 PM
One common way to organize data within a page is the slotted page structure. It starts with a header, followed by a bunch of cell pointers. These pointers reference cells at the end of the page. As you add more data, the pointers grow from left to right (2/9)
October 26, 2025 at 2:37 PM
seems it works on Maria too

mariadb.com/docs/server/...
Query Limits and Timeouts | MariaDB Documentation
mariadb.com
October 23, 2025 at 5:58 AM
MySQL :: MySQL 8.4 Reference Manual :: 6.5.1.6 mysql Client Tips
dev.mysql.com
October 22, 2025 at 3:19 PM
BTW, SQL_SAFE_UPDATES also auto-enables two other protections:

- Limits SELECT to 1,000 rows (no more accidental SELECT * on giant tables)

- Caps joins at 1M row combinations

You can override these with --select-limit and --max-join-size if needed.
October 22, 2025 at 3:19 PM
PS: I'm taking some liberties calling these "shards" — they're really just isolated databases. But I'd still call it a single logical db since they share a schema and present a unified interface, even if you can't query across them.
October 19, 2025 at 1:43 PM
You'd have to handle schema management manually or do it in the application layer. You also can't do mass migrations easily.

I call this Limitless Sharding.
October 19, 2025 at 1:43 PM
Even on a tiny machine, I can run millions of SQLite databases.

The downsides: You'd only need this at "scale". The entire architecture assumes you don't need cross-shard queries. So this only works for applications where each document is truly an individual entity.
October 19, 2025 at 1:43 PM
Since some of these embedded databases can run as WASM in the browser, each document could sync its own file directly with a backend db with some CRDT.

I'd bet this model can handle way more writes per second than a sharded Postgres/MySQL setup.
October 19, 2025 at 1:43 PM
A Figma document could just be stored as a SQLite file in the backend. You'd also need libSQL or SQLite with lightstream for backup/replication.
October 19, 2025 at 1:43 PM
If each document is its own database, it can easily handle all requests without breaking a sweat.

Running a million Postgres or MySQL instances sounds crazy and is total overkill. But embedded databases like SQLite, lmdb, and RocksDB fit this perfectly. They're just files!
October 19, 2025 at 1:43 PM
What if we ran disconnected databases that all acted as a single logical db?

Think of applications like Notion, Figma, or Google Docs — each document is disconnected from the others. The key insight: each document gets very few write requests since users are making changes manually.
October 19, 2025 at 1:43 PM