Frank McSherry
banner
frankmcsherry.bsky.social
Frank McSherry
@frankmcsherry.bsky.social
I wrote a bit about the "demand transform" for Datalog (and other similar languages): github.com/frankmcsherr...

The idea of the transform is that rather than eagerly produce all the data you might ever need, you can start from the seed crystals of concrete values and data that might yield output.
github.com
January 5, 2026 at 6:14 PM
If anyone ever told you Datalog is elegant, they probably haven't used datatoad yet. The evenness condition is an interesting example of relation-rather-than-function, though!
December 27, 2025 at 5:01 PM
I went through and validated the output counts (numbers of facts) for most of the datatoad outputs. Some were hard to validate because datatoad sneaked in on my laptop but the other systems paged too hard, but everything seems correct.

github.com/frankmcsherr...
December 25, 2025 at 2:13 PM
I put up a new post about work that I am perhaps irrationally excited about: the combination of worst-case optimal joins with relational programming.

github.com/frankmcsherr...

The high-order bits are that relational programming is very neat, and it fits great with modern (WCO) relational joins.
github.com
December 23, 2025 at 5:23 PM
I wrote a post evaluating datatoad using the framework from the recent FlowLog paper: github.com/frankmcsherr....

It turns out it does well in some cases, worse in others, and has already improved by having other folks shine a light on its limitations by choosing problems and datasets I ignored!
github.com
December 14, 2025 at 8:28 PM
I wrote about datatoad as a "worst-case optimal Datalog", which .. perhaps doesn't typecheck. A previous result on streaming worst-case optimal joins seems to connect, and swapping "streaming" for "iterative" gives something that may be syntactically correct catnip.

github.com/frankmcsherr...
github.com
December 4, 2025 at 1:32 AM
Getting crisper; now within ~1s (14.8s v 13.7s) of my hand-optimized plan that uses only binary joins, but without any plan hints. Erm, mostly. Hoping to bypass it, as a win for WCO joins. :D
I wrote a little bit about worst-case optimal joins in datatoad, support for which has just landed although there is a bit of tidying still to do before they are as crisp as the non-wco joins.

github.com/frankmcsherr...
github.com
November 22, 2025 at 1:40 AM
I wrote a little bit about worst-case optimal joins in datatoad, support for which has just landed although there is a bit of tidying still to do before they are as crisp as the non-wco joins.

github.com/frankmcsherr...
github.com
November 21, 2025 at 6:20 PM
A short note on how one can perform joins of sums by accumulating at a point halfway through the join. Neither the input nor the output of the join, but smaller intermediate terms that show up when you peer closer at what a join does internally.

github.com/frankmcsherr...
github.com
November 17, 2025 at 10:15 AM
I wrote a bit about datatoad going "full columnar". All operations are now columnar; no rows are ever formed; everything is column-at-a-time.

github.com/frankmcsherr...

A bunch of interesting (to me) algorithms, and also some performance regressions, but then clawing back. I learned things!
github.com
November 2, 2025 at 8:19 PM
Looking forward to this!
October 25, 2025 at 1:22 PM
Reposted by Frank McSherry
New from Materialize: Cloud M.1 Clusters
Run 3x larger workloads with the same low latency and predictable performance—thanks to intelligent data spilling and expanded capacity.
Learn more: bit.ly/3L12oH2
Introducing New Materialize Cloud M.1 Clusters
Introducing a new Materialize Cloud cluster type. M.1 Clusters provide customers with more capacity, leading to better economics and performance, while maintaining the same low latency requirements th...
bit.ly
October 22, 2025 at 7:52 PM
Datatoad check-in: this time including some recent progress on columnar joins (good news: faster). Though, it also tries to roll up a bit of the sprawl of content I've scribbled, which increasingly feels like it needs some more careful curation to be helpful.
github.com/frankmcsherr...
github.com
October 9, 2025 at 6:29 PM
Good news on the Datalog front: v1 of "columnar joins" seem to work, and resulted in a 20% improvement (from 9.5s to 7.5s, for the joins of a reference workload). Still more gains from tightening it up, and potentially from columnar sorting, but I'll take a swing at writing things up tomorrow!
October 8, 2025 at 10:35 PM
What a difference an allocator makes!

This is the same Rust program first using the system allocator, and then using mimalloc. About 100MB of working set in both cases, just .. apparently it pilots the system allocator to some horrible behavior.

Obviously going to start using mimalloc from now on.
September 30, 2025 at 10:20 AM
Reposted by Frank McSherry
Welcome Frank McSherry @frankmcsherry.bsky.social to Sync Conf 2025. Pioneer of sync technology, inventor of Differential Dataflow, and founder of @materialize.com, Frank will trace the evolution of sync and stream processing.
September 19, 2025 at 2:30 PM
Reposted by Frank McSherry
Highlighting some of my team's recent work: We've changed Materialize to use swap instead of memory-mapped files, with nice performance and efficiency improvements.
We’ve released a major improvement to our memory spilling infrastructure:

Materialize now uses swap to scale SQL workloads beyond RAM.

✅ Faster hydration

✅ Efficient memory utilization

✅ Bigger workloads supported

Full post from antiguru.bsky.socialbit.ly/46EF2iJ
September 18, 2025 at 2:24 PM
Reposted by Frank McSherry
We’ve released a major improvement to our memory spilling infrastructure:

Materialize now uses swap to scale SQL workloads beyond RAM.

✅ Faster hydration

✅ Efficient memory utilization

✅ Bigger workloads supported

Full post from antiguru.bsky.socialbit.ly/46EF2iJ
September 18, 2025 at 1:58 PM
Very excited to bring some column-orientation to timely and differential. At least, removing baked in row-orientation in timely, and actual column-orientation in differential, with a bunch of cool learnings from the datatoad work. I hope. We'll see. :D
September 15, 2025 at 9:29 PM
Reposted by Frank McSherry
We just released Timely Dataflow 0.24! Here are some exciting changes from @frankmcsherry.bsky.social and myself.
The container abstractions got a complete rework, and we introduce a new pattern to distribute data. Details below.
github.com/TimelyDatafl...
Release timely-v0.24.0 · TimelyDataflow/timely-dataflow
This version of Timely has some exciting new features. The Distributor trait offers a generalization of the Exchange type. It allows users to define custom distribution strategies for routing data...
github.com
September 15, 2025 at 7:44 PM
I have a trip coming up, and I'm hoping to find some content to read about the implementations of (ideally interpreted) array languages. I'm on an interpreter kick, and armed with a bunch of column-oriented libraries.

Any tips, drop a reply!
August 29, 2025 at 5:25 PM
I wrote a bit about datatoad's columnar logic for relational operators. At least, for union, intersection, antijoins, and semijoins. It turns out the joins are all easy; it's projection that is hard, of all things. Go figure.

github.com/frankmcsherr...
github.com
August 24, 2025 at 8:20 PM
If you are in SF in November, I'll be speaking at syncconf.dev (@syncconf.bsky.social)!

It's an excellent confluence of all things up-to-data. Architectures like MZ at the backend, connected via sync engines, and front ends that don't waste anyone's time waiting on database queries.
Sync Conf | Nov 12, 2025 in San Francisco.
Sync Conf is a boutique conference on the future of real-time, collaborative, agentic software development. Happening Nov 12, 2025 in San Francisco.
syncconf.dev
August 19, 2025 at 11:09 PM
In Datalog news: I had given up on getting (compiled) datafrog numbers for the "alias analysis" problem, because it is tedious to write. But thanks to an anonymous benefactor, it was coded up and we can now make a comparison between compiled datafrog and interpreted datatoad, on the same problem!
August 14, 2025 at 11:07 PM