Andy Pavlo
banner
andypavlo.bsky.social
Andy Pavlo
@andypavlo.bsky.social
Associate Prof. of Databases @ Carnegie Mellon.
Reposted by Andy Pavlo
Today's Future Data Systems Seminar Speaker: Cheng Chen will present how @mooncakelabs.bsky.social extends PostgreSQL to support Apache Iceberg. Zoom talk open to public at 4:30pm ET. YouTube video available after: db.cs.cmu.edu/events/futur...
[Future Data] Mooncake: Real-Time Apache Iceberg Without Compromise - Carnegie Mellon Database Group
Apache Iceberg is great for large-scale analytics, but it was built for... Read More +
db.cs.cmu.edu
November 10, 2025 at 12:24 PM
Reposted by Andy Pavlo
Great idea to compare plans across different systems using rows processed. A good yardstick, but slower sort-based plans from Postgres + MSSQL process fewer rows than faster hash-based plans from DuckDB. Postgres rows scanned also seem underreported. Nice to see some competition with ClickBench.
November 3, 2025 at 5:28 PM
New database leaderboard from Yellowbrick ranks the quality of DBMS optimizer estimates and plans. They only evaluate TPC-H for now and report results for Postgres + DuckDB + MSSQL: sql-arena.com/components/p...
Repo: github.com/sql-arena/db...
LinkedIn Group: www.linkedin.com/groups/15775...
November 3, 2025 at 5:07 PM
Reposted by Andy Pavlo
Today's Future Data Systems Seminar Speaker: Ryan Johnson (CMU PhD'10) will present @deltalakeoss.bsky.social's internal architecture and how it supports multi-statement transactions. Zoom talk open to public at 4:30pm ET. YouTube video available after: db.cs.cmu.edu/events/futur...
[Future Data] Multi-statement Transactions in the Databricks Lakehouse - Carnegie Mellon Database Group
The data lake architecture originally focused on self-standing tables in cloud storage,... Read More +
db.cs.cmu.edu
November 3, 2025 at 1:46 PM
Reposted by Andy Pavlo
Today's Future Data Systems Seminar Speaker: Joyo Victor will present @singlestore.com's "Bottle Service" meta-data system that supports database branching, change-data-capture, and Apache Iceberg. Zoom talk open to public at 4:30pm ET. YouTube video available after: db.cs.cmu.edu/events/futur...
[Future Data] Storage Metadata for Modern Cloud Databases - Carnegie Mellon Database Group
In modern database architecture, separating compute from storage unlocks powerful capabilities. Our... Read More +
db.cs.cmu.edu
October 27, 2025 at 11:39 AM
Lots of database action this week. Yes, I have a new start-up @sydht.ai with my PhD students @wslim.bsky.social + @17zhangw.bsky.social using LLMs to optimize almost everything in PostgreSQL. @datadictum.bsky.social posted a new article on our approach: www.theregister.com/2025/10/22/c...
Researchers tout vector-based automated tuning in PostgreSQL
: Researchers say 'Proto-X' fine-tunes databases automatically, delivering multifold performance boosts
www.theregister.com
October 23, 2025 at 3:09 PM
Reposted by Andy Pavlo
Day 2 of #P99CONF is here! The #ScyllaDB Lounge opens at 8:00 am PST, and then we get things started with keynotes from @dorlaor.bsky.social and @andypavlo.bsky.social. Don't forget that all registrants receive Instant Access to the sessions once the conference ends. www.p99conf.io?latest_sfdc_...
October 23, 2025 at 12:14 PM
Reposted by Andy Pavlo
Today's Future Data Systems Seminar Speaker: Ian Cook (@ian.columnar.tech) will present @columnar.tech's work on Apache Arrow's database connectivity API (ADBC). ADBC is available in modern DBMSs. Zoom talk open to public at 4:30pm ET. YouTube video available after: db.cs.cmu.edu/events/futur...
[Future Data] Where We're Going, We Don't Need Rows: Columnar Data Connectivity with ADBC - Carnegie Mellon Database Group
ADBC (Arrow Database Connectivity) is Apache Arrow’s answer to ODBC and JDBC:... Read More +
db.cs.cmu.edu
October 20, 2025 at 11:38 AM
Reposted by Andy Pavlo
Today's Future Data Systems Seminar Speaker: Will Manning (@willmanning.com) will present @spiraldb.com's Vortex file format. Vortex is now a @linuxfoundation.org project. Zoom talk open to public at 4:30pm ET. YouTube video available after: db.cs.cmu.edu/events/futur...
[Future Data] Vortex: LLVM for File Formats - Carnegie Mellon Database Group
Apache Parquet revolutionized columnar storage after its initial release in 2013, but... Read More +
db.cs.cmu.edu
October 13, 2025 at 11:10 AM
Reposted by Andy Pavlo
BTW if anyone wants a good intro to database storage / Log structured storage (aka LSM trees) @db.cs.cmu.edu lecture this fall is a good one: www.youtube.com/watch?v=2_sT...
#05 - Log-Structured Database Storage ✸ SingleStore Database Talk (CMU Intro to Database Systems)
YouTube video by CMU Database Group
www.youtube.com
October 7, 2025 at 1:32 PM
Reposted by Andy Pavlo
MMAP is incredibly fast when the dataset fits in memory, but it slows to a crawl when it doesn't, especially if the workload is mostly random point lookups. Speaking as someone who built an MMAP-based key-value store before :) Obligatory paper from @andypavlo.bsky.social db.cs.cmu.edu/mmap-cidr2022/
October 11, 2025 at 3:39 PM
Reposted by Andy Pavlo
Today's Future Data Systems Seminar Speaker: Jordan Tigani (@jrdntgn.bsky.social) will present how @motherduck.com supports modern workloads with DuckLake. Zoom talk open to public at 4:30pm ET. YouTube video available after: db.cs.cmu.edu/events/futur...
[Future Data] DuckLake: Learning from Cloud Data Warehouses to Build a Robust "Lakehouse" - Carnegie Mellon Database Group
When building scalable data systems, it is easy to focus on the... Read More +
db.cs.cmu.edu
October 6, 2025 at 11:55 AM
Our SIGMOD paper with our friends at Tsinghua + @wesmckinney.com + @pateljm.bsky.social on creating a next generation open-source data file format is out. F3 is a future-proof file format avoids the mistakes of Parquet.
📄 Paper: db.cs.cmu.edu/papers/2025/...
📁 Code: github.com/future-file-...
October 1, 2025 at 1:49 PM
Reposted by Andy Pavlo
Today's Future Data Systems Seminar Speaker: Vinoth Chandar will present the internals of Apache Hudi and his work at Onehouse. Zoom talk open to public at 4:30pm ET. YouTube video available after: db.cs.cmu.edu/events/futur...
[Future Data] Apache Hudi: A Database Layer over Cloud Storage for Fast Mutations and Efficient Queries - Carnegie Mellon Database Group
Data lakes emerged as a way to store vast amounts of data... Read More +
db.cs.cmu.edu
September 29, 2025 at 11:37 AM
Reposted by Andy Pavlo
Today's Future Data Systems Seminar Speaker: Russell Spitzer will present the internals of Apache Iceberg's query planner and execution engine. Zoom talk open to public at 4:30pm ET. YouTube video available after: db.cs.cmu.edu/events/futur...
[Future Data] An Extremely Technical Overview of how the Apache Iceberg™ Planning Implementation Actually Works - Carnegie Mellon Database Group
What are you trying to tell me? That I can read data... Read More +
db.cs.cmu.edu
September 22, 2025 at 11:28 AM
Next week is the start of @db.cs.cmu.edu's latest seminar series: Future Data Systems
@samarchdb.bsky.social and I are hosting speakers from leading systems in the datalake / lakehouse space.
Mondays @ 4:30pm ET via Zoom. Open to the public. Videos posted to YouTube: db.cs.cmu.edu/seminars/fal...
September 17, 2025 at 11:15 PM
I don't know what to say. You dream about it for so long and then when it finally happens you're in shock. I'm so proud of you Larry. www.theguardian.com/technology/2...
Larry Ellison overtakes Elon Musk as world’s richest person
Oracle co-founder’s shares rose by 40% in early trading, valuing his fortune at $393bn, just ahead of Musk’s $384bn
www.theguardian.com
September 10, 2025 at 7:52 PM
Reposted by Andy Pavlo
What if a database could be your game engine?

During parental leave @lukasvogel.bsky.social
built DOOMQL: A multiplayer DOOM-like where everything (rendering, game loop, state) runs in pure SQL on CedarDB.
It's fast, ridiculous, and surprisingly elegant.

Full write-up: cedardb.com/blog/doomql
September 9, 2025 at 3:17 PM
Today is the new semester for @db.cs.cmu.edu's Intro to Database Systems! We're going harder into material than before. More challenging projects but you can use LLMs to help. We also have 10min talks each Wed from leading DB companies: 15445.courses.cs.cmu.edu/fall2025
CMU 15-445/645 :: Intro to Database Systems (Fall 2025)
You want to know whether this is the premier course at Carnegie Mellon University on the design and implementation of database management systems? Well, it is. This course rips through data models (re...
15445.courses.cs.cmu.edu
August 25, 2025 at 2:29 PM
Reposted by Andy Pavlo
Launching my Programming Language Pragmatics talks! These short, accessible talks cover the material in the textbook, the 5th edition of which I wrote with Michael L. Scott. The first one introduces the topic and talks about why we study programming languages!

www.youtube.com/watch?v=hwL0...
PLP 1.1: Introduction to Programming Languages
YouTube video by Jonathan Aldrich
www.youtube.com
August 6, 2025 at 11:30 PM
The report of my death was an exaggeration. I am still alive and will be in SFO this week to speak about using LLMs to automatically tune databases. Wed Aug 6th @ 5:30pm at Databricks MTV: lu.ma/ha0dc4nj
August 4, 2025 at 10:10 AM
Reposted by Andy Pavlo
Attention, South Bay folk! We have The Databaseologist, @andypavlo.bsky.social, giving a talk in the bay on August 6th. Come join us for a great time in hearing:

ChatGPT Ain’t Got $%@& On Me! The Future of Automated Database Tuning

Register now! https://lu.ma/ha0dc4nj
South Bay Systems: ChatGPT Ain’t Got $%@& On Me! The Future of Automated Database Tuning · Luma
We're excited to feature Andy Pavlo, illustrious database professor at CMU, to talk about database tuning. This meetup's venue, food and drinks, are generously…
lu.ma
July 23, 2025 at 9:26 PM
At last @abigalekim.bsky.social's paper is out! Its the most complete eval of DB extensions/plugins ever. We analyze PostgreSQL, MySQL, MariaDB, SQLite, DuckDB, Redis.
TLDR: Postgres extns ecosystem is fraught with footguns. Other DBMSs have fewer extns but less problems. DuckDB has cleanest API.
Vol:18 No:6 → Anarchy in the Database: A Survey and Evaluation of Database Management System Extensibility
👥 Authors: Abigale Kim, Marco Slot, David Andersen, Andrew Pavlo
📄 PDF: https://www.vldb.org/pvldb/vol18/p1962-kim.pdf
July 3, 2025 at 7:03 PM
People asked for the rest of the lecture videos for CMU-DB's optimizer course (15799.courses.cs.cmu.edu/spring2025). Unfortunately I got super sick and was in the hospital for 4 weeks. Thankfully @wslim.bsky.social + Jignesh taught the remaining lectures, but we didn't record those classes.
CMU 15-799 :: Special Topics in Databases: Query Optimization (Spring 2025)
This course is a hands-on exploration of the most challenging problem in computer science: database query optimization. It will cover the classical and state-of-the-art methods and algorithms for conv...
15799.courses.cs.cmu.edu
July 1, 2025 at 2:39 PM
Shots fired by @firebolthq.bsky.social with their new on-prem executable (www.firebolt.io/blog/introdu...). They have dethroned the Umbra system by The Germans™ at ‪@tum.de in the ClickBench rankings: benchmark.clickhouse.com
June 24, 2025 at 11:10 PM