avi.im
v
@avi.im
breaking databases @tur.so W1 '21 @recursecenter.bsky.social


excited about databases, storage engines and message queues
AEADs provide a verification tag after encryption. For each page, we need a nonce too. Both the nonce & the tag become metadata for an encrypted page

So where do you store them? We could store them separately, but it's much better & neater to store them in the page itself (1/5)
October 28, 2025 at 3:14 PM
The B Tree data structure fascinates me. Databases use B Trees to store data on disk, organizing everything into pages that typically range from 4kb to 8kb. All I/O operations happen in units of these pages.

The page looks like this... (1/9)
October 26, 2025 at 2:37 PM
Pro database tip: enable `SQL_SAFE_UPDATES` in MySQL to avoid accidental UPDATE/DELETE queries without a WHERE clause.

It forces you to use a key or a LIMIT, instead of wiping whole database by mistake at 2:19am.
October 22, 2025 at 3:19 PM
Sharding. Database sharding is one of the common techniques to scale a database horizontally. You split the db into small parts called shards and distribute them across machines.

Shards are typically in the few hundreds or even thousands (for extremely large databases).
October 19, 2025 at 1:43 PM
SQLite has a page where they explain why they use C. They specifically elaborate on why not Rust

www.sqlite.org/whyc....
October 17, 2025 at 4:33 AM
Lil trivia to remember when it comes to Snapshot Isolation

jepsen.io/consistenc...
September 23, 2025 at 4:41 AM
Reposted by v
excited to share that we are following through on our earlier commitments and putting together an independent+neutral organization to house the DID PLC system, includes the directory service
Creating an Independent Public Ledger of Credentials (PLC) Directory Organization | Bluesky
The Bluesky Social app is built on an open network protocol that refers to each user by a unique Decentralized Identifier, or DID (a W3C standard). The most popular supported DID method was developed ...
docs.bsky.app
September 19, 2025 at 9:31 AM
Reposted by v
Next week is the start of @db.cs.cmu.edu's latest seminar series: Future Data Systems
@samarchdb.bsky.social and I are hosting speakers from leading systems in the datalake / lakehouse space.
Mondays @ 4:30pm ET via Zoom. Open to the public. Videos posted to YouTube: db.cs.cmu.edu/seminars/fal...
September 17, 2025 at 11:15 PM
The correct answer is either. Transaction B gets a snapshot that may or may not include the changes from A.

SI does not provide real time guarantees. If you need that, you need Strict Serializability, which guarantees that transactions are ordered in real time.
September 18, 2025 at 2:19 PM
Database systems question

Assume the database is in snapshot isolation mode. If transaction A updates, and writes x, commits, *then* transaction B starts and reads x's value, then B will see (assume single node for simplcity):

1 - Value before A's write
2 - Value written by A
3 - Either
4 - 🍿
September 16, 2025 at 2:53 PM
Published a new blog post: Setsum - order agnostic, additive, subtractive checksum

post - avi.im/blag/2025/setsum

code - github.com/avinassh/...
September 13, 2025 at 2:50 PM
The great lock in is here! For those wanting to get into systems programming and/or database internals, consider hacking on Turso DB, the SQLite rewrite in Rust. Here's why:

1. It's a database!
September 11, 2025 at 2:17 PM
Reposted by v
Hey we're hiring for in-person engineering roles in SF. I really enjoy my job and you might too. Come hang out and build developer tools!
September 8, 2025 at 10:45 PM
Where can I learn about how AI companies use caching, KV stores, and databases differently for LLMs, agentic workloads?

Someone also mentioned to me that old/traditional services aren't suited for these, so they also build databases internally. (e.g., OpenAI acquired Rockset)
September 9, 2025 at 2:27 PM
Reposted by v
For nearly a decade, MongoDB provided reliable persistence with one of the most robust storage engines. Yet somehow, the oldest jokes keep persisting, too, so here are some facts:

dev.to/franckpachot...
Resilience of MongoDB's WiredTiger Storage Engine to Disk Failure Compared to PostgreSQL and Oracle
There have been jokes that have contributed to persistent myths about MongoDB's durability. The...
dev.to
September 9, 2025 at 7:22 AM
Reposted by v
Netflix had it all wrong, don’t waste engineering resources to build your own chaos monkey infrastructure, just put production on AWS us-east-1 and you get chaos monkey for free.

Just kidding, just kidding…
September 7, 2025 at 10:32 PM
This is the opening text of Transaction Processing: Concepts and Techniques by Jim Gray

"Six thousand years ago, the Sumerians invented writing for transaction processing."

September 7, 2025 at 1:38 PM
Published a new post: Oldest recorded transaction.

This totally could have been just a tweet (skeet?), but I wanted to publish something today.

avi.im/blag/2025/old...
September 6, 2025 at 2:33 PM
Which is the best local LLM to set up for asking questions about code?

I have large codebases like the Linux kernel, Postgres, etc. I want to ask questions like "find methods that do XYZ" and also post large code snippets and ask for explanations.
September 6, 2025 at 7:27 AM
Reposted by v
If you're you curious about #LeanLang and want to understand the connection between #programming and #proofs, check out this great new video by Ank Yog. The analogy between Chess and true propositions is particularly compelling!

www.youtube.com/watch?v=QXQN...
September 5, 2025 at 3:58 PM
My extreme opinion is that anything other than serializable isolation is a scam. Database people haven't figured out how to make it fast, so we have ended up with other half baked isolation levels.
September 5, 2025 at 1:38 PM
This is the oldest transaction database from 3100 BC - recording accounts of malt and barley groats.

Considering this thing survived 5000 years (holy shit!) with zero downtime and has stronger durability guarantees than most databases today.

I call it rock solid durability.
September 4, 2025 at 1:38 PM
Reposted by v
Yes! Here's a recording from @devoxx.com: youtu.be/zOOFMHAjoPI?...
Keep Your Cache Always Fresh with Debezium! by Gunnar Morling
YouTube video by Devoxx
youtu.be
September 4, 2025 at 12:44 PM
Reposted by v
Nice one! Spoke about this kind of architecture a while ago too: speakerdeck.com/gunnarmorlin.... Back then, using Kafka Streams as an IVM engine and Infinispan as the serving layer for denormalized views, but Sqlite (or DuckDB) would work too, with better queryability, as you say m
Keep your cache always fresh with Debezium! (Current 22)
The saying goes that there are only two hard things in Computer Science: cache invalidation, and naming things. Well, turns out the first one is solved …
speakerdeck.com
September 3, 2025 at 3:00 PM