Frank McSherry
banner
frankmcsherry.bsky.social
Frank McSherry
@frankmcsherry.bsky.social
I wrote a bit about datatoad going "full columnar". All operations are now columnar; no rows are ever formed; everything is column-at-a-time.

github.com/frankmcsherr...

A bunch of interesting (to me) algorithms, and also some performance regressions, but then clawing back. I learned things!
github.com
November 2, 2025 at 8:19 PM
Looking forward to this!
October 25, 2025 at 1:22 PM
Reposted by Frank McSherry
New from Materialize: Cloud M.1 Clusters
Run 3x larger workloads with the same low latency and predictable performance—thanks to intelligent data spilling and expanded capacity.
Learn more: bit.ly/3L12oH2
Introducing New Materialize Cloud M.1 Clusters
Introducing a new Materialize Cloud cluster type. M.1 Clusters provide customers with more capacity, leading to better economics and performance, while maintaining the same low latency requirements th...
bit.ly
October 22, 2025 at 7:52 PM
Datatoad check-in: this time including some recent progress on columnar joins (good news: faster). Though, it also tries to roll up a bit of the sprawl of content I've scribbled, which increasingly feels like it needs some more careful curation to be helpful.
github.com/frankmcsherr...
github.com
October 9, 2025 at 6:29 PM
Good news on the Datalog front: v1 of "columnar joins" seem to work, and resulted in a 20% improvement (from 9.5s to 7.5s, for the joins of a reference workload). Still more gains from tightening it up, and potentially from columnar sorting, but I'll take a swing at writing things up tomorrow!
October 8, 2025 at 10:35 PM
What a difference an allocator makes!

This is the same Rust program first using the system allocator, and then using mimalloc. About 100MB of working set in both cases, just .. apparently it pilots the system allocator to some horrible behavior.

Obviously going to start using mimalloc from now on.
September 30, 2025 at 10:20 AM
Reposted by Frank McSherry
Welcome Frank McSherry @frankmcsherry.bsky.social to Sync Conf 2025. Pioneer of sync technology, inventor of Differential Dataflow, and founder of @materialize.com, Frank will trace the evolution of sync and stream processing.
September 19, 2025 at 2:30 PM
Reposted by Frank McSherry
Highlighting some of my team's recent work: We've changed Materialize to use swap instead of memory-mapped files, with nice performance and efficiency improvements.
We’ve released a major improvement to our memory spilling infrastructure:

Materialize now uses swap to scale SQL workloads beyond RAM.

✅ Faster hydration

✅ Efficient memory utilization

✅ Bigger workloads supported

Full post from antiguru.bsky.socialbit.ly/46EF2iJ
September 18, 2025 at 2:24 PM
Reposted by Frank McSherry
We’ve released a major improvement to our memory spilling infrastructure:

Materialize now uses swap to scale SQL workloads beyond RAM.

✅ Faster hydration

✅ Efficient memory utilization

✅ Bigger workloads supported

Full post from antiguru.bsky.socialbit.ly/46EF2iJ
September 18, 2025 at 1:58 PM
Very excited to bring some column-orientation to timely and differential. At least, removing baked in row-orientation in timely, and actual column-orientation in differential, with a bunch of cool learnings from the datatoad work. I hope. We'll see. :D
September 15, 2025 at 9:29 PM
Reposted by Frank McSherry
We just released Timely Dataflow 0.24! Here are some exciting changes from @frankmcsherry.bsky.social and myself.
The container abstractions got a complete rework, and we introduce a new pattern to distribute data. Details below.
github.com/TimelyDatafl...
Release timely-v0.24.0 · TimelyDataflow/timely-dataflow
This version of Timely has some exciting new features. The Distributor trait offers a generalization of the Exchange type. It allows users to define custom distribution strategies for routing data...
github.com
September 15, 2025 at 7:44 PM
I have a trip coming up, and I'm hoping to find some content to read about the implementations of (ideally interpreted) array languages. I'm on an interpreter kick, and armed with a bunch of column-oriented libraries.

Any tips, drop a reply!
August 29, 2025 at 5:25 PM
I wrote a bit about datatoad's columnar logic for relational operators. At least, for union, intersection, antijoins, and semijoins. It turns out the joins are all easy; it's projection that is hard, of all things. Go figure.

github.com/frankmcsherr...
github.com
August 24, 2025 at 8:20 PM
If you are in SF in November, I'll be speaking at syncconf.dev (@syncconf.bsky.social)!

It's an excellent confluence of all things up-to-data. Architectures like MZ at the backend, connected via sync engines, and front ends that don't waste anyone's time waiting on database queries.
Sync Conf | Nov 12, 2025 in San Francisco.
Sync Conf is a boutique conference on the future of real-time, collaborative, agentic software development. Happening Nov 12, 2025 in San Francisco.
syncconf.dev
August 19, 2025 at 11:09 PM
In Datalog news: I had given up on getting (compiled) datafrog numbers for the "alias analysis" problem, because it is tedious to write. But thanks to an anonymous benefactor, it was coded up and we can now make a comparison between compiled datafrog and interpreted datatoad, on the same problem!
August 14, 2025 at 11:07 PM
I wrote about the projects done at Materialize’s recent hackathon. Many very cool projects, and also one that I worked on; take a read!

materialize.com/blog/spring_...
materialize.com
August 13, 2025 at 9:59 PM
A neat new Materialize post from our QA department on speeding up CI. materialize.com/blog/speedin...
Speeding up Materialize CI
How we slashed CI runtime for Materialize by up to 86% through smarter builds, caching, parallelization, and clever tooling.
materialize.com
August 8, 2025 at 9:22 PM
Datalog weekend: we graduate to queries with cyclic rules, non-binary relations, and generally more interesting behavior. In particular, we're going to compare ourselves against interpreted Soufflé; a standard reference point!

How does interpreted datatoad compare?

github.com/frankmcsherr...
github.com
August 2, 2025 at 5:37 PM
We have a new @materialize.com post up, this time about pushing selection predicates into our persistence layer.

Materialize hybridizes batch and streaming computation (it does both, regularly), and draws on the best optimizations of each (in this case, CDWs).

materialize.com/blog/how-fil...
How filter pushdown works
Using part statistics and abstract interpretation to push complex filters all the way down to the storage layer.
materialize.com
July 31, 2025 at 11:56 PM
Reposted by Frank McSherry
Nice that the Bluesky firehose is now becoming a live dataset on which to demo streaming databases
We have a new blog post up at @materialize.com about analyzing the Bluesky firehose (Jetstream, really) through Materialize. You can grab a copy of the community edition of MZ and follow along, or invent your own ways of looking at the data, live!

materialize.com/blog/analyzi...
Analyzing Live Social Data: Exploring Social Trends on Bluesky
Bluesky provides a public firehose that we can stream into Materialize, through which we can observe live social behavior and trends.
materialize.com
July 29, 2025 at 11:25 AM
We have a weekend Datalog update: datatoad now comes with tries! I took the long route to get here, and have a bit of other work in the pipeline as part of all this. But I'm happy to report that memory went down by another 2x (expected), and runtime went down by ~1.6x.

github.com/frankmcsherr...
github.com
July 20, 2025 at 7:39 PM
We have a new blog post up at @materialize.com about analyzing the Bluesky firehose (Jetstream, really) through Materialize. You can grab a copy of the community edition of MZ and follow along, or invent your own ways of looking at the data, live!

materialize.com/blog/analyzi...
Analyzing Live Social Data: Exploring Social Trends on Bluesky
Bluesky provides a public firehose that we can stream into Materialize, through which we can observe live social behavior and trends.
materialize.com
July 16, 2025 at 11:51 AM
Reposted by Frank McSherry
I had a great time at Kris's workshop in May! Lots of inspiring talks and discussions. He has posted a mega-video of all the recorded talks, I highly recommend checking some of them out if you're into Datalog, logic programming, or incrementalization. There is even some e-graph stuff in there!
May 25-27, 2025, I hosted an event, the "Minnowbrook Logic Programming Seminar," in Blue Mountain Lake, NY. I recorded 11 talks on Datalog-related interests, totaling over 9+ hours of video, which I have just now published on YouTube youtu.be/3ec9VfMUVa8
Minnowbrook Logic Programming Seminar (Supercut w/ Extras)
YouTube video by Kristopher Micinski
youtu.be
July 8, 2025 at 4:50 PM
Folks! The workshop at Minnowbrook that prompted the datatoad work has put up the videos from the talk. NINE HOURS! Of logic programming content! I liked the talks; they are chaptered; you can see @mwillsey.com talk about e-graphs; why haven't you clicked yet???

www.youtube.com/watch?v=3ec9...
Minnowbrook Logic Programming Seminar (Supercut w/ Extras)
YouTube video by Kristopher Micinski
www.youtube.com
July 9, 2025 at 12:58 AM