Jenny kwan
jennyckwan.bsky.social
Jenny kwan
@jennyckwan.bsky.social
Co-founder and CTO of AprioriDB. Software and data engineer. http://jennykwan.org/ https://www.linkedin.com/in/jennyckwan/
in part 1, I break down:
• why central data teams existed in the first place
• the 4 walls everyone hits trying to implement mesh
• why modern tools actually fight domain ownership

the real problem? we never fixed the infrastructure.

read it here:
🔗 jennyckwan.medium.com/data-mesh-wh...
Data Mesh: Why It’s Not Working (and How to Fix It) — Part 1
Data Mesh promised to liberate domain teams from centralized bottlenecks. But six years later, implementation walls persist. Part 1.
jennyckwan.medium.com
May 15, 2025 at 4:10 AM
Undo isn’t a luxury.

It’s the safety net that lets us move fast without fear.

It’s 2025. Let’s stop pretending rollbacks are good enough.

📖 Read the post here: jennyckwan.medium.com/its-2025-why...

Let me know what you think. What would undo change for you?

#Undo #DataEngineering #AprioriDB
It’s 2025. Why Is “Undo” Still Missing From Data Infrastructure?
Fixing data often means rebuilding everything. What if we had real undo — not just for the last step, but for any past mistake?
jennyckwan.medium.com
May 8, 2025 at 4:47 AM
In this post, I break down:

🔁 Why undo matters
❌ Why it’s missing
🧩 The 3 kinds of undo
🧱 What undo requires under the hood
🚀 What changes when we finally get it

And how we’re building that future at AprioriDB.
May 8, 2025 at 4:47 AM
We have undo in Google Docs.
Undo in Git.
Even undo in email.

But not in the data infrastructure we use to run billion-dollar decisions.

Undo is still practically impossible.

That’s not just a missing feature — it’s a design failure.
May 8, 2025 at 4:47 AM
Let's demand data platforms that offer true historical control & permanence! What's your biggest frustration with current "time travel"?

(P.S. We're building this foundation at AprioriDB. Looking for passionate engineers/architects to collaborate! Check the site linked in the post 😉)
May 1, 2025 at 3:05 PM
I dive deep into these limitations & a better approach (incl. concrete bitemporal query examples!) in my latest blog post. If you're tired of ephemeral history and messy corrections, check it out:

jennyckwan.medium.com/lakehouse-ti...
Lakehouse Time Travel Deletes History. You Need Forks & Permanence.
Delta Lake & Iceberg offer read-only, expiring history snapshots. Real data control fixes the past and keeps it forever.
jennyckwan.medium.com
May 1, 2025 at 3:05 PM
The fix? Stop versioning bulky files! Version the lightweight *semantics* (logic + context). Use bitemporality & model corrections as *logical forks*.

This allows potentially INFINITE logical history – even if old data *files* are pruned, they can be deterministically recomputed. 🤯
May 1, 2025 at 3:05 PM
This also leads to the correction nightmare. Trying to fix data for 'last Tuesday' *today* tangles System Time (when fix ran) & Effective Time (when data was valid).

You need BITEMPORALITY to distinguish: 'show data *about* Tuesday *as known now*' vs 'show data *as it looked system-wise* Tuesday'.
May 1, 2025 at 3:05 PM
The core issue: They version *physical files* (Parquet, etc.), not the *semantic meaning* (the logic + context + effective time) that CREATED them.

Knowing *which files* existed ≠ knowing *why* or being able to fix past logic cleanly. Plus, storing infinite files is $$$, hence forced deletion.
May 1, 2025 at 3:05 PM
📖 Read it here:
Your ETL Pipeline is Wasting Time (And It’s Not Your Fault) — Part 1
jennyckwan.medium.com/youre-not-cr...

More coming soon. If you’ve felt this pain, I’d love to hear your story. (7/7)
You’re Not Crazy — ETL Wastes Time on Unnecessary Work
Why do ETL pipelines constantly redo work? Part 1 of a series exploring the root causes of waste in modern data processing.
jennyckwan.medium.com
April 24, 2025 at 10:24 PM
We need data systems that remember:
• Logic.
• Context.
• Time.
• Causal structure.

Not just “what is,” but “how it came to be.”

That’s what I wrote about in Part 1 of a new blog series. (6/7)
April 24, 2025 at 10:24 PM
So we rebuild everything.

Because the foundation gives us no way to say:

“This hasn’t changed. Skip it.”
“That logic was wrong, but only back then.”
“This result already exists. Reuse it.”

That’s where the waste comes from. (5/7)
April 24, 2025 at 10:24 PM
This isn’t accidental waste. It’s designed in.

Our platforms forget:
• The logic (expression) that created the data.
• The context (provenance) it ran in.
• The time semantics of “when” vs. “for when.”

They can’t reason about change. (4/7)
April 24, 2025 at 10:24 PM
Backfills? Expensive. Fragile. Slow.

And worse — they lie. They overwrite past meaning with present-day logic.

Recomputing April 10th with April 17th code isn’t a fix. It’s a temporal paradox. (3/7)
April 24, 2025 at 10:24 PM
We avoid UPDATE and MERGE. We rewrite entire days of data with INSERT. Why?

Because idempotent full rewrites became the only safe pattern on platforms that forget how data was made.

This isn’t optimization. It’s coping. (2/7)
April 24, 2025 at 10:24 PM