Lightnews — Scholar-powered news

Jack Vanlightly

@vanlightly.bsky.social

Durable functions have many names across frameworks, but it reduces to 3 forms:
stateless functions, sessions, actors.

Explainer here:
jack-vanlightly.com/blog/2025/12...

The Three Durable Function Forms — Jack Vanlightly

Durable execution engines (DEEs) talk about “workflows”, “activities”, “virtual objects”, “handlers”, and “functions”, but they’re often describing the same underlying execution patterns. This post pr...

jack-vanlightly.com

December 10, 2025 at 2:07 PM

Jack Vanlightly

@vanlightly.bsky.social

New post: The Durable Function Tree.
Durable execution engines all end up building some form of function tree with suspension points shaped by local vs remote side effects. I look at why, the trade-offs, and where orchestration should (and shouldn’t) be used.
jack-vanlightly.com/blog/2025/12...

The Durable Function Tree - Part 1 — Jack Vanlightly

In my last post I wrote a bout why and where determinism is needed in durable execution (DE). In this post I'm going to explore how workflows can be formed from trees of durable function calls ba...

jack-vanlightly.com

December 4, 2025 at 2:49 PM

Reposted by Jack Vanlightly

Gunnar Morling

@gunnarmorling.dev

📝 Blogged: "On Idempotency Keys"

Discussing several options for ensuring exactly-once processing in distributed systems using idempotency keys, from UUIDs to monotonically increasing sequences.

👉 www.morling.dev/blog/on-idem...

November 25, 2025 at 4:38 PM

Jack Vanlightly

@vanlightly.bsky.social

New blog post: Demystifying Determinism in Durable Execution

Why do durable execution frameworks care so much about determinism? I unpack the underlying mechanics.

Post: jack-vanlightly.com/blog/2025/11...

Demystifying Determinism in Durable Execution — Jack Vanlightly

Determinism is a key concept to understand when writing code using durable execution frameworks such as Temporal, Restate, DBOS, and Resonate. If you read the docs you see that some parts of your code...

jack-vanlightly.com

November 24, 2025 at 2:04 PM

Jack Vanlightly

@vanlightly.bsky.social

New blog post about Qbeast and how it brings a multidimensional spatial index to Iceberg/Delta.

🔹 Hypercube-based layout
🔹 Index used by writers, invisible to engines
🔹 Better locality + pruning, adaptive layout

Lots of innovation ahead in lakehouses.

jack-vanlightly.com/blog/2025/11...

Have your Iceberg Cubed, Not Sorted: Meet Qbeast, the OTree Spatial Index — Jack Vanlightly

In today’s post I want to walk through a fascinating indexing technique for data lakehouses which flips the role of the index in open table formats like Apache Iceberg and Delta Lake. We are going to...

jack-vanlightly.com

November 19, 2025 at 1:57 PM

Jack Vanlightly

@vanlightly.bsky.social

Stream-order vs batch-order in Iceberg:
* Flink wants temporal locality.
* Spark wants value locality.

Same table, conflicting physics.

New post: jack-vanlightly.com/blog/2025/11...

How Would You Like Your Iceberg Sir? Stream or Batch Ordered? — Jack Vanlightly

Today I want to talk about stream analytics, batch analytics and Apache Iceberg. Stream and batch analytics work differently but both can be built on top of Iceberg, but due to their differences there...

jack-vanlightly.com

November 5, 2025 at 2:52 PM

Jack Vanlightly

@vanlightly.bsky.social

Three KIPs (1150, 1176, 1183) all target Kafka’s cross-AZ replication costs but there is a wider question at stake.

My new post explains the KIPs, the trade-offs between reusing old abstractions vs. embracing stateless compute over S3.

jack-vanlightly.com/blog/2025/10...

A Fork in the Road: Deciding Kafka’s Diskless Future — Jack Vanlightly

“ The Kafka community is currently seeing an unprecedented situation with three KIPs ( KIP-1150 , KIP-1176 , KIP-1183) simultaneously addressing the same challenge of high replica...

jack-vanlightly.com

October 22, 2025 at 12:51 PM

Jack Vanlightly

@vanlightly.bsky.social

New post: why I’m not a fan of “zero-copy” Iceberg tables for Apache Kafka.
From a systems design view, it trades storage savings for coupling and complexity.
Sometimes, duplication is cheaper than coupling.
jack-vanlightly.com/blog/2025/10...

Why I’m not a fan of zero-copy Apache Kafka-Apache Iceberg — Jack Vanlightly

Over the past few months, I’ve seen a growing number of posts on social media promoting the idea of a “zero-copy” integration between Apache Kafka and Apache Iceberg. The idea is that Kafka topics cou...

jack-vanlightly.com

October 15, 2025 at 1:39 PM

Jack Vanlightly

@vanlightly.bsky.social

Why don’t Iceberg or Delta Lake have secondary indexes?
Because analytics workloads and OLTP workloads optimize for opposite I/O patterns.

See my dive into data layout, pruning, and what “indexing” really means in open table formats: jack-vanlightly.com/blog/2025/10...

Beyond Indexes: How Open Table Formats Optimize Query Performance — Jack Vanlightly

My career in data started as a SQL Server performance specialist, which meant I was deep into the nuances of indexes, locking and blocking, execution plan analysis and query design. These days I’m mor...

jack-vanlightly.com

October 8, 2025 at 1:01 PM

Jack Vanlightly

@vanlightly.bsky.social

New deep dive: Understanding Apache Fluss

I spent August reverse-engineering Fluss, Alibaba’s new table storage engine for Flink (partially forked from Kafka). This post covers its architecture, tiering, and how it tackles changelogs & low-latency state.

jack-vanlightly.com/blog/2025/9/...

Understanding Apache Fluss — Jack Vanlightly

This is a data system internals blog post. So if you enjoyed my table formats internals blog posts , or writing on Apache Kafka internals or Apache BookKeeper internals , you might enjoy thi...

jack-vanlightly.com

September 2, 2025 at 12:57 PM

Jack Vanlightly

@vanlightly.bsky.social

New blog post: A Conceptual Model for Storage Unification.

The post defines what storage unification means, defines terminology and evaluates different building blocks and approaches to doing it.

jack-vanlightly.com/blog/2025/8/...

A Conceptual Model for Storage Unification — Jack Vanlightly

Object storage is taking over more of the data stack, but low-latency systems still need separate hot-data storage. Storage unification is about presenting these heterogeneous storage systems and form...

jack-vanlightly.com

August 21, 2025 at 1:16 PM

Jack Vanlightly

@vanlightly.bsky.social

In a future of autonomous AI agents, we can't limit ourselves to error prevention and error detection, we must also include remediation.

jack-vanlightly.com/blog/2025/7/...

Remediation: What happens after AI goes wrong? — Jack Vanlightly

If you’re following the world of AI right now, no doubt you saw Jason Lemkin’s post on social media reporting how Replit’s AI deleted his production database , despite it being told not to touch an...

jack-vanlightly.com

July 28, 2025 at 12:17 PM

Jack Vanlightly

@vanlightly.bsky.social

Science moves slowly because wrong theories waste decades. Engineering is careful because failures kill people. Software moves fast because mistakes are cheap, the expensive error isn't making the wrong choice, it's taking too long to make any choice. jack-vanlightly.com/blog/2025/7/...

The Cost of Being Wrong — Jack Vanlightly

A recent LinkedIn post by Nick Lebesis caught my attention with this brutal take on the difference between good startup founders and coward startup founders. I recommend you read the entire thing ...

jack-vanlightly.com

July 22, 2025 at 3:09 PM

Jack Vanlightly

@vanlightly.bsky.social

Where does reliability begin, and where does it end? In distributed business architectures, the answer is responsibility boundaries. New post: jack-vanlightly.com/blog/2025/7/...

Responsibility Boundaries in the Coordinated Progress model — Jack Vanlightly

Building on my previous work on the Coordinated Progress model, this post examines how reliable triggers not only initiate work but also establish responsibility boundaries . Where a reliable tri...

jack-vanlightly.com

July 15, 2025 at 2:16 PM

Jack Vanlightly

@vanlightly.bsky.social

ChatGPT thought it was Tuesday, so I made fun of it and it admitted it was Wednesday. So I made fun of it again, and it admitted it was...Wednesday. But sure, AI agents are gonna steal my job 🤔

July 3, 2025 at 4:22 PM

Jack Vanlightly

@vanlightly.bsky.social

ChatGPT has hallucinated so many times for me today. It's invented scientific terms that don't exist, has been quite liberal with plausible answers based on what sounds reasonable, but without any real world justification. When challenged, it admits it's mistake.

June 24, 2025 at 6:32 PM

Jack Vanlightly

@vanlightly.bsky.social

My musical evolution continues, discovered deep hypnotic drone music today. No drugs required 😄 The Hypnus Records label is great.

June 13, 2025 at 2:33 PM

Jack Vanlightly

@vanlightly.bsky.social

How to reliably distribute work across microservices, stream processors, durable execution, event-driven, orchestration and now AI agents?

Coordinated Progress is a 4 part series that explores the common structure behind reliable distributed systems.

jack-vanlightly.com/blog/2025/6/...

Coordinated Progress – Part 1 – Seeing the System: The Graph — Jack Vanlightly

At some point, we’ve all sat in an architecture meeting where someone asks, “ Should this be an event? An RPC? A queue? ”, or “ How do we tie this process together across our microservices? Should it ...

jack-vanlightly.com

June 11, 2025 at 2:29 PM

Jack Vanlightly

@vanlightly.bsky.social

I took a break from social media and my blog for a couple of months. ND burnout. But I'm tentatively back, probably just to post my writing here for now. HOTDS is on pause. Getting back to writing is therapeutic though. I'll post something this week that I've been working on.

June 9, 2025 at 11:23 AM

Jack Vanlightly

@vanlightly.bsky.social

Another Humans of the Data Sphere is out, with issue 10! In this issue people are talking fsyncs, tips for running ClickHouse at scale, the problems with MCP and more. Plus I dig up a classic paper from 1962. www.hotds.dev/p/humans-of-...

Humans of the Data Sphere Issue #10 April 4th 2025

Your biweekly dose of insights, observations, commentary and opinions from interesting people from the world of databases, AI, streaming, distributed systems and the data engineering/analytics space.

www.hotds.dev

April 4, 2025 at 4:15 PM

Jack Vanlightly

@vanlightly.bsky.social

Proud to have contributed formal verification (TLA+) for three key improvements in Kafka 4.0:

✅ KIP-966: Strengthens the replication protocol.
✅ KIP-996: Introduces PreVote for more stable KRaft leadership.
✅ KIP-848: Delivers more efficient, predictable rebalancing.

April 3, 2025 at 4:00 PM

Jack Vanlightly

@vanlightly.bsky.social

Wow, I just discovered gamma wave music. Wrote non-stop for three hours.

March 25, 2025 at 1:01 PM

Jack Vanlightly

@vanlightly.bsky.social

Any Principal Engineers out there with ADHD or creative wiring — who don’t thrive in the tasks of project coordination, alignment meetings, and people management, but thrive on strategy, system design, writing, and shaping direction through ideas? Curious how you navigate the role.

March 21, 2025 at 1:52 PM

Jack Vanlightly

@vanlightly.bsky.social

A new disaggregated log replication survey post is out. How does the combination of Apache Pulsar with Apache BookKeeper divide and conquer the responsibilities of log replication? jack-vanlightly.com/blog/2025/3/...

Log Replication Disaggregation Survey - Apache Pulsar and BookKeeper — Jack Vanlightly

In this latest post of the disaggregated log replication survey, we’re going to look at the Apache BookKeeper Replication Protocol and how it is used by Apache Pulsar to form topic partitions. Raft ...

jack-vanlightly.com

March 13, 2025 at 12:53 PM

Jack Vanlightly

@vanlightly.bsky.social

I think I have an issue with tabs, it's grown to 371. My workstation is struggling to open chrome after a restart now.

March 12, 2025 at 7:18 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news