Andy Pavlo
banner
andypavlo.bsky.social
Andy Pavlo
@andypavlo.bsky.social
Associate Prof. of Databases @ Carnegie Mellon.
TIL
November 10, 2025 at 4:34 PM
Reposted by Andy Pavlo
Great idea to compare plans across different systems using rows processed. A good yardstick, but slower sort-based plans from Postgres + MSSQL process fewer rows than faster hash-based plans from DuckDB. Postgres rows scanned also seem underreported. Nice to see some competition with ClickBench.
November 3, 2025 at 5:28 PM
I've been working on this for +15 years and I think we're finally there.
LLM reasoning agent decides what sub-agent to invoke based on circumstances. Those sub-agents could be using heuristics or ML. Our new holistic algorithm is not LLM-based but we LLMs to reduce training: arxiv.org/abs/2510.17748
This is Going to Sound Crazy, But What If We Used Large Language Models to Boost Automatic Database Tuning Algorithms By Leveraging Prior History? We Will Find Better Configurations More Quickly Than ...
Tuning database management systems (DBMSs) is challenging due to trillions of possible configurations and evolving workloads. Recent advances in tuning have led to breakthroughs in optimizing over the...
arxiv.org
October 24, 2025 at 3:18 AM
This is to make up for not calling my biological daughter "DROP TABLE students; --". My wife didn't go for it when I tried.
twitter.com/weschow/stat...
Wes Chow on X: "@andy_pavlo @DeepGenes If there was someone who could have actually pulled it off... I guess there's always the next one! Congrats! https://t.co/Xet1zRlNd8" / X
@andy_pavlo @DeepGenes If there was someone who could have actually pulled it off... I guess there's always the next one! Congrats! https://t.co/Xet1zRlNd8
x.com
October 23, 2025 at 3:43 PM
The company is officially called "SO-YOU-DONT-HAVE-TO INCORPORATED'); DROP TABLE companies; --".
A lot of websites and the IRS don't like that name though.
We will announce more about it later this year. You can sign up to be on the waitlist: sydht.ai
SO-YOU-DONT-HAVE-TO INCORPORATED'); DROP TABLE companies; --
SO-YOU-DONT-HAVE-TO is a next generation automated PostgreSQL optimization platform based on agentic artifical intelligence.
sydht.ai
October 23, 2025 at 3:09 PM
Reposted by Andy Pavlo
MMAP is incredibly fast when the dataset fits in memory, but it slows to a crawl when it doesn't, especially if the workload is mostly random point lookups. Speaking as someone who built an MMAP-based key-value store before :) Obligatory paper from @andypavlo.bsky.social db.cs.cmu.edu/mmap-cidr2022/
October 11, 2025 at 3:39 PM
Good. You need to build your strength back up to prep for your next challenge as CS dept chair!
October 4, 2025 at 1:02 AM
What are you talking about? MapReduce is the opposite of "moving compute to the data". It was all about moving/pulling the data to compute in a shared-disk architecture. See this old paper: dl.acm.org/doi/10.1145/...
A comparison of approaches to large-scale data analysis | Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
dl.acm.org
October 2, 2025 at 2:32 PM
There was collaboration attempt between CMU, Tsinghua, Meta, CWI, Nvidia, Voltron, & SpiralDB. But then lawyers got involved and it fell apart. Everyone released their own format:
→ Meta Nimble: github.com/facebookincu...
→ CWI FastLanes: github.com/cwida/FastLa...
→ SpiralDB Vortex: vortex.dev
GitHub - facebookincubator/nimble: New file format for storage of large columnar datasets.
New file format for storage of large columnar datasets. - facebookincubator/nimble
github.com
October 1, 2025 at 1:49 PM
Our F3 files embed small WASM programs to decode data. If somebody creates a new encoding and the DBMS does not have native impl, it can still read data using WASM passing Arrow buffers. Our experiments show WASM is 15-20% slower than native. We use @spiraldb.com's Vortex encoding impls.
October 1, 2025 at 1:49 PM
One problem with Parquet is many implementations are not updated when the official spec improves. Everyone just uses the lowest version feature set. That means if Parquet adds a better data encoding scheme and a file uses it, many common reader libraries won't be able retrieve the data.
October 1, 2025 at 1:49 PM
Shoot I don't know how I missed that when I was copy-pasting. It wasn't intentional. Sorry :-(
September 18, 2025 at 2:18 PM