Lightnews — Scholar-powered news

Jenny kwan

@jennyckwan.bsky.social

in part 1, I break down:
• why central data teams existed in the first place
• the 4 walls everyone hits trying to implement mesh
• why modern tools actually fight domain ownership

the real problem? we never fixed the infrastructure.

read it here:
🔗 jennyckwan.medium.com/data-mesh-wh...

Data Mesh: Why It’s Not Working (and How to Fix It) — Part 1

Data Mesh promised to liberate domain teams from centralized bottlenecks. But six years later, implementation walls persist. Part 1.

jennyckwan.medium.com

May 15, 2025 at 4:10 AM

Jenny kwan

@jennyckwan.bsky.social

Undo isn’t a luxury.

It’s the safety net that lets us move fast without fear.

It’s 2025. Let’s stop pretending rollbacks are good enough.

📖 Read the post here: jennyckwan.medium.com/its-2025-why...

Let me know what you think. What would undo change for you?

#Undo #DataEngineering #AprioriDB

It’s 2025. Why Is “Undo” Still Missing From Data Infrastructure?

Fixing data often means rebuilding everything. What if we had real undo — not just for the last step, but for any past mistake?

jennyckwan.medium.com

May 8, 2025 at 4:47 AM

Jenny kwan

@jennyckwan.bsky.social

In this post, I break down:

🔁 Why undo matters
❌ Why it’s missing
🧩 The 3 kinds of undo
🧱 What undo requires under the hood
🚀 What changes when we finally get it

And how we’re building that future at AprioriDB.

May 8, 2025 at 4:47 AM

Jenny kwan

@jennyckwan.bsky.social

We have undo in Google Docs.
Undo in Git.
Even undo in email.

But not in the data infrastructure we use to run billion-dollar decisions.

Undo is still practically impossible.

That’s not just a missing feature — it’s a design failure.

May 8, 2025 at 4:47 AM

Jenny kwan

@jennyckwan.bsky.social

Let's demand data platforms that offer true historical control & permanence! What's your biggest frustration with current "time travel"?

(P.S. We're building this foundation at AprioriDB. Looking for passionate engineers/architects to collaborate! Check the site linked in the post 😉)

May 1, 2025 at 3:05 PM

Jenny kwan

@jennyckwan.bsky.social

I dive deep into these limitations & a better approach (incl. concrete bitemporal query examples!) in my latest blog post. If you're tired of ephemeral history and messy corrections, check it out:

jennyckwan.medium.com/lakehouse-ti...

Lakehouse Time Travel Deletes History. You Need Forks & Permanence.

Delta Lake & Iceberg offer read-only, expiring history snapshots. Real data control fixes the past and keeps it forever.

jennyckwan.medium.com

May 1, 2025 at 3:05 PM

Jenny kwan

@jennyckwan.bsky.social

The fix? Stop versioning bulky files! Version the lightweight *semantics* (logic + context). Use bitemporality & model corrections as *logical forks*.

This allows potentially INFINITE logical history – even if old data *files* are pruned, they can be deterministically recomputed. 🤯

May 1, 2025 at 3:05 PM

Jenny kwan

@jennyckwan.bsky.social

This also leads to the correction nightmare. Trying to fix data for 'last Tuesday' *today* tangles System Time (when fix ran) & Effective Time (when data was valid).

You need BITEMPORALITY to distinguish: 'show data *about* Tuesday *as known now*' vs 'show data *as it looked system-wise* Tuesday'.

May 1, 2025 at 3:05 PM

Jenny kwan

@jennyckwan.bsky.social

The core issue: They version *physical files* (Parquet, etc.), not the *semantic meaning* (the logic + context + effective time) that CREATED them.

Knowing *which files* existed ≠ knowing *why* or being able to fix past logic cleanly. Plus, storing infinite files is $$$, hence forced deletion.

May 1, 2025 at 3:05 PM

Jenny kwan

@jennyckwan.bsky.social

📖 Read it here:
Your ETL Pipeline is Wasting Time (And It’s Not Your Fault) — Part 1
jennyckwan.medium.com/youre-not-cr...

More coming soon. If you’ve felt this pain, I’d love to hear your story. (7/7)

You’re Not Crazy — ETL Wastes Time on Unnecessary Work

Why do ETL pipelines constantly redo work? Part 1 of a series exploring the root causes of waste in modern data processing.

jennyckwan.medium.com

April 24, 2025 at 10:24 PM

Jenny kwan

@jennyckwan.bsky.social

We need data systems that remember:
• Logic.
• Context.
• Time.
• Causal structure.

Not just “what is,” but “how it came to be.”

That’s what I wrote about in Part 1 of a new blog series. (6/7)

April 24, 2025 at 10:24 PM

Jenny kwan

@jennyckwan.bsky.social

So we rebuild everything.

Because the foundation gives us no way to say:

“This hasn’t changed. Skip it.”
“That logic was wrong, but only back then.”
“This result already exists. Reuse it.”

That’s where the waste comes from. (5/7)

April 24, 2025 at 10:24 PM

Jenny kwan

@jennyckwan.bsky.social

This isn’t accidental waste. It’s designed in.

Our platforms forget:
• The logic (expression) that created the data.
• The context (provenance) it ran in.
• The time semantics of “when” vs. “for when.”

They can’t reason about change. (4/7)

April 24, 2025 at 10:24 PM

Jenny kwan

@jennyckwan.bsky.social

Backfills? Expensive. Fragile. Slow.

And worse — they lie. They overwrite past meaning with present-day logic.

Recomputing April 10th with April 17th code isn’t a fix. It’s a temporal paradox. (3/7)

April 24, 2025 at 10:24 PM

Jenny kwan

@jennyckwan.bsky.social

We avoid UPDATE and MERGE. We rewrite entire days of data with INSERT. Why?

Because idempotent full rewrites became the only safe pattern on platforms that forget how data was made.

This isn’t optimization. It’s coping. (2/7)

April 24, 2025 at 10:24 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news