Andy Pavlo
banner
andypavlo.bsky.social
Andy Pavlo
@andypavlo.bsky.social
Associate Prof. of Databases @ Carnegie Mellon.
New database leaderboard from Yellowbrick ranks the quality of DBMS optimizer estimates and plans. They only evaluate TPC-H for now and report results for Postgres + DuckDB + MSSQL: sql-arena.com/components/p...
Repo: github.com/sql-arena/db...
LinkedIn Group: www.linkedin.com/groups/15775...
November 3, 2025 at 5:07 PM
Our F3 files embed small WASM programs to decode data. If somebody creates a new encoding and the DBMS does not have native impl, it can still read data using WASM passing Arrow buffers. Our experiments show WASM is 15-20% slower than native. We use @spiraldb.com's Vortex encoding impls.
October 1, 2025 at 1:49 PM
One problem with Parquet is many implementations are not updated when the official spec improves. Everyone just uses the lowest version feature set. That means if Parquet adds a better data encoding scheme and a file uses it, many common reader libraries won't be able retrieve the data.
October 1, 2025 at 1:49 PM
Our SIGMOD paper with our friends at Tsinghua + @wesmckinney.com + @pateljm.bsky.social on creating a next generation open-source data file format is out. F3 is a future-proof file format avoids the mistakes of Parquet.
📄 Paper: db.cs.cmu.edu/papers/2025/...
📁 Code: github.com/future-file-...
October 1, 2025 at 1:49 PM
Fall 2025 Seminar Schedule:
Sep 22: Apache Iceberg
Sep 29: Apache Hudi
Oct 06: @motherduck.com
Oct 13: SpiralDB Vortex
Oct 27: @singlestore.com
Nov 03: @deltalakeoss.bsky.social
Nov 10: Mooncake
Nov 17: @firebolthq.bsky.social
Nov 24: @xtdb.com
Dec 01: Apache Polaris
September 17, 2025 at 11:15 PM
Next week is the start of @db.cs.cmu.edu's latest seminar series: Future Data Systems
@samarchdb.bsky.social and I are hosting speakers from leading systems in the datalake / lakehouse space.
Mondays @ 4:30pm ET via Zoom. Open to the public. Videos posted to YouTube: db.cs.cmu.edu/seminars/fal...
September 17, 2025 at 11:15 PM
Thank you to our @db.cs.cmu.edu Affiliate companies for their support this academic year:
@clickhouse.com
@datastax.com
@getdbt.com
@firebolthq.bsky.social
@motherduck.com
• RelationalAI
@singlestore.com
@spiraldb.com
• PingCAP / TiDB
• Yellowbrick
@yugabytedb.bsky.social
August 25, 2025 at 2:29 PM
Everything is available for free to non-CMU students:
• Lectures on YouTube: www.youtube.com/playlist?lis...
• Slides + Notes + Homeworks on course website.
• Project source code on GitHub: github.com/cmu-db/bustub
• Grading with Gradescope (see FAQ ➡️ 15445.courses.cs.cmu.edu/fall2025/faq...)
August 25, 2025 at 2:29 PM
The report of my death was an exaggeration. I am still alive and will be in SFO this week to speak about using LLMs to automatically tune databases. Wed Aug 6th @ 5:30pm at Databricks MTV: lu.ma/ha0dc4nj
August 4, 2025 at 10:10 AM
No system hits the sweet spot of allowing for extensibility while maintaining systems safety. It would be nice if there was a standard plugin API (think POSIX) that allows compatibility across systems.

Thanks to @marcoslot.com + @daveandersen.bsky.social for their collaboration on this project
July 3, 2025 at 7:03 PM
About 16% of PostgreSQL extns are incompatible with at least one other extn. Common problems include not enforcing APIs, undefined behaviors, and memory errors. Heavyweight extensions like Citus + @timescaledb.bsky.social have most issues because they touch more DBMS internal parts.
July 3, 2025 at 7:03 PM
Abi created a torture chamber that downloads every extension we could find and automatically installs them in different combinations to see what breaks. We expanded our analysis to include other popular open-source DBMSs but could not break them.
Code : github.com/cmu-db/ext-a...
July 3, 2025 at 7:03 PM
I was disappointed in my inability to work on database research while in the hospital. It's surprisingly hard to concentrate on important things like databases when you can't breath. You can see me hooked up on oxygen at the beginning of this seminar talk.
www.youtube.com/watch?v=iPYh...
July 1, 2025 at 2:39 PM
Shots fired by @firebolthq.bsky.social with their new on-prem executable (www.firebolt.io/blog/introdu...). They have dethroned the Umbra system by The Germans™ at ‪@tum.de in the ClickBench rankings: benchmark.clickhouse.com
June 24, 2025 at 11:10 PM
Over on Twitter, it's 2025 and people are still proposing terrible ideas for using blockchain databases.

Why would anyone want to use a P2P BFT DBMS to track disease outbreaks? Put it in a centralized DBMS. At least now only 50% of replies are in favor of it. That's progress...
February 11, 2025 at 12:08 AM
Spring 2025 Seminar Schedule:
Feb 10: @convex.dev
Feb 17: The Germans (TUM)
Feb 24: @apachepinot.bsky.social
Mar 03: Malloy (@lloydtabb.bsky.social)
Mar 10: Google SQL Pipes
Mar 24: PRQL
Mar 31: StarRocks
Apr 07: @oxidecomputer.bsky.social
Apr 14: @mariadb.bsky.social
Apr 21: @edgedb.com
January 29, 2025 at 2:32 PM
Do you hate SQL and wish it would just die? Or do you love SQL and wish it ran faster? If you answered 'yes' to either question then join our Spring 2025 @db.cs.cmu.edu Seminar Series: SQL or Death?
Mondays @ 4:30pm via Zoom. Videos posted to YouTube: db.cs.cmu.edu/seminar2025/
January 29, 2025 at 2:32 PM
Since multiple people asked, @samarchdb.bsky.social's research doesn't really work well on Postgres because it has one of the worst query optimizers we've tested (sorry it's true). It cannot decorrelate subqueries needed to inline UDFs. See Sam's CIDR 2024 paper: db.cs.cmu.edu/papers/2024/...
December 7, 2024 at 3:10 AM
The results are stunning! I've never had a paper where we get 1000x speed ups! We had to make separate log-scale graphs just for queries that were astronomically faster. And yes, DuckDB is *much* faster than MSSQL.

@samarchdb.bsky.social is a freak of nature. He will be on job market in 2 years.
December 6, 2024 at 2:56 PM
@samarchdb.bsky.social's brilliance is to break the UDF up, pull out the parts you do want to inline, and then codegen (outline) the rest into small functions. You can do a bunch of compiler tricks to make the UDF + SQL work better together. DuckDB even allows you to vectorize the functions too.
December 6, 2024 at 2:56 PM
Microsoft introduced the Froid inlining technique in SQL Server back in 2019 that converts T-SQL UDFs into relational algebra (RA). It then embeds that algebra into query plan.
Paper: dl.acm.org/doi/10.1145/...
Info: learn.microsoft.com/en-us/sql/re...
December 6, 2024 at 2:56 PM
How to *Not* Run Database Benchmarks 101
Rando is promoting new NanoCube query engine that is 200x faster than DuckDB/SQLite/Polars. Written in Python too! But they get those results by building bitmap indexes and neglecting to include index build times! www.reddit.com/r/Python/com...
November 25, 2024 at 1:05 PM
Right now in Amsterdam
November 22, 2024 at 4:43 PM
The Database Capital of Europe
November 22, 2024 at 11:21 AM
ChatGPT's ranking of the 10 worst database systems:
1. dBase (1979)
2. FileMakerPro (1985)
3. Lotus Notes (1989)
4. Access (1992)
5. Adabas (1971)
6. Pervasive PSQL (1996)
7. Sybase ASE (1987)
8. FrontBase (1993)
9. Empress (1979)
10. InterBase (1984)
August 11, 2023 at 1:58 AM