Lightnews — Scholar-powered news

Weston Pace

@westonpace.bsky.social

Also, I guess its "cleaning the filter" not changing it

November 2, 2025 at 4:28 PM

Weston Pace

@westonpace.bsky.social

Maybe only a thing on some washing machines (or only when you have enough pet hair in your home 😅)

My old washing machine had to be taken apart to change the filter but my new one has a little door.

November 2, 2025 at 4:27 PM

Weston Pace

@westonpace.bsky.social

Resetting the garbage disposal with Allen wrench. Changing the filter on clothes washer. Testing and replacing smoke alarms.

November 2, 2025 at 4:18 PM

Weston Pace

@westonpace.bsky.social

Ah, I ran into something very similar yesterday with an async "find or insert" cache. The first caller canceled the request while the insert future was in progress (dropped the future) and that cache key was forever blocked.

October 31, 2025 at 7:27 PM

Weston Pace

@westonpace.bsky.social

Nice definition! This matches my use. I also usually have a touch of "please don't hate me I'm doing my best"

October 29, 2025 at 8:05 PM

Weston Pace

@westonpace.bsky.social

The awkward monkey puppet meme with the text "Well..." from a maintainer of Lance, a lake house format that might just happen to be what the author is describing...

October 17, 2025 at 6:01 PM

Weston Pace

@westonpace.bsky.social

Your coworkers about to flood the channel because "I guess he doesn't want threads for this one"

a black and white photo of a woman wearing a turtleneck sweater and a dress .

ALT: a black and white photo of a woman wearing a turtleneck sweater and a dress .

media.tenor.com

October 16, 2025 at 2:56 PM

Weston Pace

@westonpace.bsky.social

I suspect this will change as caching layers become more mature. The selectivity threshold for cloud storage is something like "one in a million" but more like "one in a thousand" for NVMe.

Also, a self-promotional shout out that you might want to look at lance (lancedb.github.io/lance/format...)

October 8, 2025 at 9:46 PM

Weston Pace

@westonpace.bsky.social

They do a bit of both. The base model is unsupervised and is generally described as "learning the language". The model is then fine tuned with supervision for a specific task.

The "suck up as much data as you can" is for the first part.

October 7, 2025 at 11:38 PM

Weston Pace

@westonpace.bsky.social

Though I think the "we can't change Parquet" problem is a bit of a false problem. 90% of Parquet users are probably fine to just keep using Parquet. I'm not sure I agree that "the long time archival format" and the "database storage format" need to be the same thing.

October 3, 2025 at 9:38 PM

Weston Pace

@westonpace.bsky.social

That might be next week's blog post ;). Short answer is I see it as a table format problem and not a file format problem. Change "decoder" to "file reader". Change "stored in the page" to "stored in a folder on the table" and change "wasm" to "pluggable" (native or wasm).

October 3, 2025 at 9:38 PM

Weston Pace

@westonpace.bsky.social

Hope this helps, it's fun to see so much exciting innovation in a space that's been relatively quiet for many years!

October 3, 2025 at 5:18 PM

Weston Pace

@westonpace.bsky.social

F3 is from a joint project between CMU and Tsinghua University. They have tackled the "forwards compatibility" problem by storing WASM decoders with the data so that old readers can read data written by futuristic writers.

October 3, 2025 at 5:18 PM

Weston Pace

@westonpace.bsky.social

FastLanes comes from CWI. They're the group that's designed some of the new lightweight compression algorithms (e.g. FSST). They definitely focus on compression and they likely have the best layout for processing data already in memory.

October 3, 2025 at 5:18 PM

Weston Pace

@westonpace.bsky.social

Vortex comes from SpiralDB. They've done a good job explaining what they do and writing about it. They've made a big focus on compression but, especially, on pushing down compute to run against compressed data.

October 3, 2025 at 5:18 PM

Weston Pace

@westonpace.bsky.social

Nimble comes from Meta, and there has sadly not been much written about it publicly. The best I can say at the moment is that Nimble has made perhaps the biggest emphasis of all the formats on extremely wide schemas (again, all formats have done some here).

October 3, 2025 at 5:18 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news