Alex Miller
alexmillerdb.bsky.social
Alex Miller
@alexmillerdb.bsky.social
Database Papers as a Service
So what’s the feature set difference between pgDog, Neki, and multigres?
November 15, 2025 at 11:56 PM
Reposted by Alex Miller
Our next event is on November 19th at StarTree’s office in downtown Mountain View. Come hear about Morel from Julian Hyde and and Query Optimization as a Service from Yuanyuan Tian!
luma.com/xygolo9c
South Bay Systems: Morel / Query Optimization as a Service · Luma
Welcome to another edition of South Bay Systems! This time we bring you two wonderful talks: Julian Hyde will be speaking about Morel, a new functional…
luma.com
November 7, 2025 at 1:40 AM
Reposted by Alex Miller
The recording from our last South Bay Systems meetup is now available!
youtu.be/f1bz3efUJpM
Apache Pinot on Object Storage & JSON in Apache Doris
YouTube video by South Bay Systems
youtu.be
November 1, 2025 at 5:56 PM
Reposted by Alex Miller
@abigalekim.bsky.social @xiangpeng.systems and I are kicking off Madison Systems with a coffee chat on Sunday, Nov 9th. Come nerd out on systems!

luma.com/v69tvpla
Madison Systems Coffee Chat · Luma
If you’re working on or are interested in anything in the space of software internals (compilers, databases, operating systems, etc.), come grab a cup of…
luma.com
October 23, 2025 at 6:52 PM
[PVLDB] Enhancing Transaction Processing through Indirection Skipping
www.vldb.org/pvldb/v...

Whereas VMCache improve pointer swizzing's complexity by removing the swizzling, this work points out that page and frame hints are highly effective, and okay if they're wrong.
October 17, 2025 at 4:02 AM
This reminded me I've been sitting on draft blog posts about Copy-and-Patch JIT compilation for a while, and so I've finally published the first chunk of it: a minimal tutorial and explanation of how and why Copy-and-Patch actually works.

Start at transactional.blog/copy-and-pat...
October 13, 2025 at 11:26 PM
Reposted by Alex Miller
South Bay Systems returns on October 27th at Adobe in downtown San Jose. We have an Analytics-on-Object-Storage double feature this time starring two different Apache projects: Apache Pinot and Apache Doris. (Talk descriptions below.)

Register now!
luma.com/9o6bahgc
South Bay Systems: Apache Pinot on Object Storage / Variants in Apache Doris · Luma
Welcome to another edition of South Bay Systems! This time, we'll have a double feature! First we'll have Songqiao Su and Raghav Yadav talking about…
luma.com
October 13, 2025 at 6:47 PM
Reposted by Alex Miller
There was an accident with the recording where audio wasn't captured, so instead we can offer a recording from one of Jakob's practice runs on twitch: www.twitch.tv/videos/25845...
October 7, 2025 at 5:26 PM
Reposted by Alex Miller
Had a fun time at the South Bay Systems meetup last night. Thanks @yugabytedb.bsky.social for hosting!

@codedrift.social gave a great talk on WebAssembly: what it is (and isn't), how it connects to WASI, and promising projects. He cuts through a lot of the hype vs. reality. Recording coming soon.
October 3, 2025 at 10:38 PM
[ASPLOS'25] Fusion: An Analytics Object Store Optimized for Query
Pushdown
www.cs.princeton.edu...

Tightly integrating an Iceberg catalog with an object store means that one could make file-format aware erasure coding decisions, to permit pushing down filters and aggregations.
September 28, 2025 at 11:42 PM
[VLDB] Towards Principled, Practical Document Database Design
www.vldb.org/pvldb/v...

If you've ever wished that there was a document database equivalent for relational databases' 3NF-style schema design guidance, then this is the paper for you.
September 23, 2025 at 5:23 PM
[arXiv] On the Theoretical Limitations of Embedding-Based Retrieval
arxiv.org/abs/2508.2...

It's impossible to retrieve all combinations of pairs of documents post-embedding. Thus, there's usecases that vector search won't do well at. Conversely, BM25 excels in these cases.
September 21, 2025 at 3:38 AM
I text-to-speech papers often, and www.paper2audio.com finally did the one thing that I was hoping AI would enable: replace tables/figures/diagrams with a summary of what is being shown. It makes table/diagram-heavy papers actually comprehensible. There's iOS and Android apps, and it's free.
September 11, 2025 at 9:21 PM
[VLDB] NaviX: A Native Vector Index Design for Graph DBMSs With Robust Predicate-Agnostic Search Performance
www.vldb.org/pvldb/v...

It feels like a follow-on/improvement to ACORN. Also interesting to see HNSW built directly on a graph database working well.
September 5, 2025 at 5:11 AM
Someone should go implement a bulk loading into btree mechanism relying on man7.org/linux/man-pa... to be able to prepare a tree of data, and then just atomically drop it into the main btree file as a sub-tree, as that'd be pretty cool to read about.
August 21, 2025 at 7:44 PM
There’s surprisingly been no good citation for follower reads and the trade-offs therein. Super excited that this finally got published. law-theorem.com had “Coming soon!” for a few years 😭
Vol:18 No:9 → The LAW theorem: Local Reads and Linearizable Asynchronous Replication
👥 Authors: Emmanouil Giortamis, Antonios Katsarakis, Vasilis Gavrielatos, Pramod Bhatotia, Aleksandar Dragojevic, Boris Grot, Vijay Nagarajan,...
📄 PDF: https://www.vldb.org/pvldb/vol18/p2831-giortamis.pdf
August 20, 2025 at 4:51 PM
For anyone else trying to catch up on DBSP, my recommended flow of learning is:
1. Watch the talk: www.youtube.com/watch?v=omOH... (h/t @wslim.bsky.social)
2. Read the spec/book: mihaibudiu.github.io/work/dbsp-sp... (h/t @avi.im)
3. Read the VLDB paper

List is ordered by assumed knowledge of reader
August 19, 2025 at 9:21 PM
In the Postgres-style MVCC vs MySQL-style MVCC debates, I'd really love to see an implementation of time-separated btrees (dl.acm.org/doi/pdf/10.1...) evaluated. It's CoW-BTree style "your path down the tree prunes out versions you don't want to see", but update-in-place and copies only on splits.
August 17, 2025 at 6:29 AM
Reposted by Alex Miller
People often ask me about the differences in architecture between Amazon Dynamo (the 2007 SOSP paper), DynamoDB (the AWS serverless NoSQL database), and Aurora DSQL (the AWS serverless SQL databases).

I memoized the response on my blog. brooker.co.za/blog/2025/08...
Dynamo, DynamoDB, and Aurora DSQL - Marc's Blog
brooker.co.za
August 16, 2025 at 1:42 AM
[arXiv] Theseus: A Distributed and Scalable GPU-Accelerated Query
Processing Platform Optimized for Efficient Data Movement
arxiv.org/pdf/2508.0...

Great to see that Voltron Data folk writing about their GPU database!
August 15, 2025 at 1:05 AM
Reposted by Alex Miller
Today’s NULL BITMAP is a very special one—I have been doing NULL BITMAP every week for two years, and to celebrate, this week I got a collection of friends to put together a printable zine of articles. I hope you enjoy it! buttondown.com/jaffray/arch...
August 11, 2025 at 6:11 PM
To randomly sample a number of operations, one pulls from a PRNG. github.com/buildup-d... instead shows a cute trick for defining a stateless PRNG: pull RDTSC, run it through a quick hash to scramble the bits (e.g. rapidhash). Cache-miss-free, but you lose determinism in tests.
mysql-server-RP/include/my_rnd.h at 9d88f21761ab9ffe34a4b5831c97e87edfb9c53a · buildup-db/mysql-server-RP
MySQL RP (Restore Performance) is modified version of MySQL Community, to restore performance equal to or better than previous major versions. - buildup-db/mysql-server-RP
github.com
August 9, 2025 at 6:21 AM
Reposted by Alex Miller
We had our biggest @southbaysystems.xyz meetup yet last night! Thanks to everyone who came, and thanks to Databricks for hosting!

@andypavlo.bsky.social discussed the 50-year history of database tuning, applying AI/ML to the problem, and the future of auto-tuning (agentic reasoning, of course).
August 7, 2025 at 5:21 PM
If you have a blog hosted on cloudflare pages and any part of your css looks missing only on safari, it's because of www.cloudflarestatus.com/incidents/ps..., and you have to go purge the cache to get rid of the cached wrongly-compressed asset files.
Cloudflare Pages: Compression issues with custom hostnames
Cloudflare's Status Page - Cloudflare Pages: Compression issues with custom hostnames.
www.cloudflarestatus.com
August 4, 2025 at 8:33 PM
A good read on other, nicer ways that ISAs can represent vectorized loops: www.bitsnbites.eu/three-fundam...
Three fundamental flaws of SIMD ISA:s – Bits'n'Bites
www.bitsnbites.eu
July 30, 2025 at 6:26 PM