Lightnews — Scholar-powered news

Frank McSherry

@frankmcsherry.bsky.social

I wrote a bit about the "demand transform" for Datalog (and other similar languages): github.com/frankmcsherr...

The idea of the transform is that rather than eagerly produce all the data you might ever need, you can start from the seed crystals of concrete values and data that might yield output.

github.com

January 5, 2026 at 6:14 PM

Frank McSherry

@frankmcsherry.bsky.social

If anyone ever told you Datalog is elegant, they probably haven't used datatoad yet. The evenness condition is an interesting example of relation-rather-than-function, though!

A Datalog program for determining iterates of the Collatz conjecture.

December 27, 2025 at 5:01 PM

Frank McSherry

@frankmcsherry.bsky.social

I went through and validated the output counts (numbers of facts) for most of the datatoad outputs. Some were hard to validate because datatoad sneaked in on my laptop but the other systems paged too hard, but everything seems correct.

github.com/frankmcsherr...

December 25, 2025 at 2:13 PM

Frank McSherry

@frankmcsherry.bsky.social

I put up a new post about work that I am perhaps irrationally excited about: the combination of worst-case optimal joins with relational programming.

github.com/frankmcsherr...

The high-order bits are that relational programming is very neat, and it fits great with modern (WCO) relational joins.

github.com

December 23, 2025 at 5:23 PM

Frank McSherry

@frankmcsherry.bsky.social

I wrote a post evaluating datatoad using the framework from the recent FlowLog paper: github.com/frankmcsherr....

It turns out it does well in some cases, worse in others, and has already improved by having other folks shine a light on its limitations by choosing problems and datasets I ignored!

github.com

December 14, 2025 at 8:28 PM

Frank McSherry

@frankmcsherry.bsky.social

I wrote about datatoad as a "worst-case optimal Datalog", which .. perhaps doesn't typecheck. A previous result on streaming worst-case optimal joins seems to connect, and swapping "streaming" for "iterative" gives something that may be syntactically correct catnip.

github.com/frankmcsherr...

github.com

December 4, 2025 at 1:32 AM

Frank McSherry

@frankmcsherry.bsky.social

Getting crisper; now within ~1s (14.8s v 13.7s) of my hand-optimized plan that uses only binary joins, but without any plan hints. Erm, mostly. Hoping to bypass it, as a win for WCO joins. :D

Frank McSherry @frankmcsherry.bsky.social · Nov 21

I wrote a little bit about worst-case optimal joins in datatoad, support for which has just landed although there is a bit of tidying still to do before they are as crisp as the non-wco joins.

github.com/frankmcsherr...

github.com

November 22, 2025 at 1:40 AM

Frank McSherry

@frankmcsherry.bsky.social

I wrote a little bit about worst-case optimal joins in datatoad, support for which has just landed although there is a bit of tidying still to do before they are as crisp as the non-wco joins.

github.com/frankmcsherr...

github.com

November 21, 2025 at 6:20 PM

Frank McSherry

@frankmcsherry.bsky.social

A short note on how one can perform joins of sums by accumulating at a point halfway through the join. Neither the input nor the output of the join, but smaller intermediate terms that show up when you peer closer at what a join does internally.

github.com/frankmcsherr...

github.com

November 17, 2025 at 10:15 AM

Frank McSherry

@frankmcsherry.bsky.social

I wrote a bit about datatoad going "full columnar". All operations are now columnar; no rows are ever formed; everything is column-at-a-time.

github.com/frankmcsherr...

A bunch of interesting (to me) algorithms, and also some performance regressions, but then clawing back. I learned things!

github.com

November 2, 2025 at 8:19 PM

Frank McSherry

@frankmcsherry.bsky.social

Looking forward to this!

Sync Conf @syncconf.bsky.social · Oct 24

Featuring
👉 @notion.com
👉 @figma.com
👉 @swyx.io
👉 @threepointone.bsky.social
👉 @frankmcsherry.bsky.social
👉 @kyle.bricolage.io
👉 @jamescowling.dev
👉 @aaronboodman.com
👉 @schickling.dev
👉 @anselm.io
👉 @adamwiggins.bsky.social
👉 @powersync.com
👉 @b5.bsky.social
👉 @f0a.org
👉 @cdata.earth

October 25, 2025 at 1:22 PM

Reposted by Frank McSherry

Materialize

@materialize.com

New from Materialize: Cloud M.1 Clusters
Run 3x larger workloads with the same low latency and predictable performance—thanks to intelligent data spilling and expanded capacity.
Learn more: bit.ly/3L12oH2

Introducing New Materialize Cloud M.1 Clusters

Introducing a new Materialize Cloud cluster type. M.1 Clusters provide customers with more capacity, leading to better economics and performance, while maintaining the same low latency requirements th...

bit.ly

October 22, 2025 at 7:52 PM

Frank McSherry

@frankmcsherry.bsky.social

Datatoad check-in: this time including some recent progress on columnar joins (good news: faster). Though, it also tries to roll up a bit of the sprawl of content I've scribbled, which increasingly feels like it needs some more careful curation to be helpful.
github.com/frankmcsherr...

github.com

October 9, 2025 at 6:29 PM

Frank McSherry

@frankmcsherry.bsky.social

Good news on the Datalog front: v1 of "columnar joins" seem to work, and resulted in a 20% improvement (from 9.5s to 7.5s, for the joins of a reference workload). Still more gains from tightening it up, and potentially from columnar sorting, but I'll take a swing at writing things up tomorrow!

October 8, 2025 at 10:35 PM

Frank McSherry

@frankmcsherry.bsky.social

What a difference an allocator makes!

This is the same Rust program first using the system allocator, and then using mimalloc. About 100MB of working set in both cases, just .. apparently it pilots the system allocator to some horrible behavior.

Obviously going to start using mimalloc from now on.

Runtimes of the same application with two different allocators; mimalloc is nearly 100x faster.

September 30, 2025 at 10:20 AM

Reposted by Frank McSherry

Sync Conf

@syncconf.bsky.social

Welcome Frank McSherry @frankmcsherry.bsky.social to Sync Conf 2025. Pioneer of sync technology, inventor of Differential Dataflow, and founder of @materialize.com, Frank will trace the evolution of sync and stream processing.

September 19, 2025 at 2:30 PM

Reposted by Frank McSherry

Moritz Hoffmann

@antiguru.bsky.social

Highlighting some of my team's recent work: We've changed Materialize to use swap instead of memory-mapped files, with nice performance and efficiency improvements.

Materialize @materialize.com · Sep 18

We’ve released a major improvement to our memory spilling infrastructure:

Materialize now uses swap to scale SQL workloads beyond RAM.

✅ Faster hydration

✅ Efficient memory utilization

✅ Bigger workloads supported

Full post from antiguru.bsky.social → bit.ly/46EF2iJ

September 18, 2025 at 2:24 PM

Reposted by Frank McSherry

Materialize

@materialize.com

We’ve released a major improvement to our memory spilling infrastructure:

Materialize now uses swap to scale SQL workloads beyond RAM.

✅ Faster hydration

✅ Efficient memory utilization

✅ Bigger workloads supported

Full post from antiguru.bsky.social → bit.ly/46EF2iJ

September 18, 2025 at 1:58 PM

Frank McSherry

@frankmcsherry.bsky.social

Very excited to bring some column-orientation to timely and differential. At least, removing baked in row-orientation in timely, and actual column-orientation in differential, with a bunch of cool learnings from the datatoad work. I hope. We'll see. :D

Moritz Hoffmann @antiguru.bsky.social · Sep 15

We just released Timely Dataflow 0.24! Here are some exciting changes from @frankmcsherry.bsky.social and myself.
The container abstractions got a complete rework, and we introduce a new pattern to distribute data. Details below.
github.com/TimelyDatafl...

Release timely-v0.24.0 · TimelyDataflow/timely-dataflow

This version of Timely has some exciting new features. The Distributor trait offers a generalization of the Exchange type. It allows users to define custom distribution strategies for routing data...

github.com

September 15, 2025 at 9:29 PM

Reposted by Frank McSherry

Moritz Hoffmann

@antiguru.bsky.social

We just released Timely Dataflow 0.24! Here are some exciting changes from @frankmcsherry.bsky.social and myself.
The container abstractions got a complete rework, and we introduce a new pattern to distribute data. Details below.
github.com/TimelyDatafl...

Release timely-v0.24.0 · TimelyDataflow/timely-dataflow

This version of Timely has some exciting new features. The Distributor trait offers a generalization of the Exchange type. It allows users to define custom distribution strategies for routing data...

github.com

September 15, 2025 at 7:44 PM

Frank McSherry

@frankmcsherry.bsky.social

I have a trip coming up, and I'm hoping to find some content to read about the implementations of (ideally interpreted) array languages. I'm on an interpreter kick, and armed with a bunch of column-oriented libraries.

Any tips, drop a reply!

August 29, 2025 at 5:25 PM

Frank McSherry

@frankmcsherry.bsky.social

I wrote a bit about datatoad's columnar logic for relational operators. At least, for union, intersection, antijoins, and semijoins. It turns out the joins are all easy; it's projection that is hard, of all things. Go figure.

github.com/frankmcsherr...

github.com

August 24, 2025 at 8:20 PM

Reposted by Frank McSherry

Martin Kleppmann

@martin.kleppmann.com

Notion's new offline support is based on our rich text CRDT research x.com/ivanhzhao/st...

Ivan Zhao on X: "For those of local first nerds and @inkandswitch fans: This is the paper co-authored by @sliminality @geoffreylitt @pvh Martin Kleppmann https://t.co/FMhf4olmg4 Thank you for laying the technical foundation for block-based, rich text CRDT for the world." / X

For those of local first nerds and @inkandswitch fans: This is the paper co-authored by @sliminality @geoffreylitt @pvh Martin Kleppmann https://t.co/FMhf4olmg4 Thank you for laying the technical foundation for block-based, rich text CRDT for the world.

x.com

August 19, 2025 at 7:55 PM

Frank McSherry

@frankmcsherry.bsky.social

If you are in SF in November, I'll be speaking at syncconf.dev (@syncconf.bsky.social)!

It's an excellent confluence of all things up-to-data. Architectures like MZ at the backend, connected via sync engines, and front ends that don't waste anyone's time waiting on database queries.

Sync Conf | Nov 12, 2025 in San Francisco.

Sync Conf is a boutique conference on the future of real-time, collaborative, agentic software development. Happening Nov 12, 2025 in San Francisco.

syncconf.dev

August 19, 2025 at 11:09 PM

Frank McSherry

@frankmcsherry.bsky.social

In Datalog news: I had given up on getting (compiled) datafrog numbers for the "alias analysis" problem, because it is tedious to write. But thanks to an anonymous benefactor, it was coded up and we can now make a comparison between compiled datafrog and interpreted datatoad, on the same problem!

August 14, 2025 at 11:07 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news