Weston Pace
westonpace.bsky.social
Weston Pace
@westonpace.bsky.social
Software developer working on all things arrow and columnar storage, currently, Lance.
Also, I guess its "cleaning the filter" not changing it
November 2, 2025 at 4:28 PM
Maybe only a thing on some washing machines (or only when you have enough pet hair in your home 😅)

My old washing machine had to be taken apart to change the filter but my new one has a little door.
November 2, 2025 at 4:27 PM
Resetting the garbage disposal with Allen wrench. Changing the filter on clothes washer. Testing and replacing smoke alarms.
November 2, 2025 at 4:18 PM
Ah, I ran into something very similar yesterday with an async "find or insert" cache. The first caller canceled the request while the insert future was in progress (dropped the future) and that cache key was forever blocked.
October 31, 2025 at 7:27 PM
Nice definition! This matches my use. I also usually have a touch of "please don't hate me I'm doing my best"
October 29, 2025 at 8:05 PM
October 17, 2025 at 6:01 PM
Your coworkers about to flood the channel because "I guess he doesn't want threads for this one"
a black and white photo of a woman wearing a turtleneck sweater and a dress .
ALT: a black and white photo of a woman wearing a turtleneck sweater and a dress .
media.tenor.com
October 16, 2025 at 2:56 PM
I suspect this will change as caching layers become more mature. The selectivity threshold for cloud storage is something like "one in a million" but more like "one in a thousand" for NVMe.

Also, a self-promotional shout out that you might want to look at lance (lancedb.github.io/lance/format...)
October 8, 2025 at 9:46 PM
They do a bit of both. The base model is unsupervised and is generally described as "learning the language". The model is then fine tuned with supervision for a specific task.

The "suck up as much data as you can" is for the first part.
October 7, 2025 at 11:38 PM
Though I think the "we can't change Parquet" problem is a bit of a false problem. 90% of Parquet users are probably fine to just keep using Parquet. I'm not sure I agree that "the long time archival format" and the "database storage format" need to be the same thing.
October 3, 2025 at 9:38 PM
That might be next week's blog post ;). Short answer is I see it as a table format problem and not a file format problem. Change "decoder" to "file reader". Change "stored in the page" to "stored in a folder on the table" and change "wasm" to "pluggable" (native or wasm).
October 3, 2025 at 9:38 PM
Hope this helps, it's fun to see so much exciting innovation in a space that's been relatively quiet for many years!
October 3, 2025 at 5:18 PM
F3 is from a joint project between CMU and Tsinghua University. They have tackled the "forwards compatibility" problem by storing WASM decoders with the data so that old readers can read data written by futuristic writers.
October 3, 2025 at 5:18 PM
FastLanes comes from CWI. They're the group that's designed some of the new lightweight compression algorithms (e.g. FSST). They definitely focus on compression and they likely have the best layout for processing data already in memory.
October 3, 2025 at 5:18 PM
Vortex comes from SpiralDB. They've done a good job explaining what they do and writing about it. They've made a big focus on compression but, especially, on pushing down compute to run against compressed data.
October 3, 2025 at 5:18 PM
Nimble comes from Meta, and there has sadly not been much written about it publicly. The best I can say at the moment is that Nimble has made perhaps the biggest emphasis of all the formats on extremely wide schemas (again, all formats have done some here).
October 3, 2025 at 5:18 PM