Mahdi Karabiben
banner
mahdiqb.bsky.social
Mahdi Karabiben
@mahdiqb.bsky.social
Product @Neo4j. Ex-Zendesk. I love hearing what the data has to say. Views are my own. he/him.
In 2017, we redefined "Data Engineering." Today, we're at a similar inflection point for the Semantic Layer.

In 2022 (MDS R.I.P.), we treated it as a fancy SQL generator: Thousands of YAML lines to calculate metrics for dashboards.

Deep down, we all knew it was overkill. 1/4
November 27, 2025 at 1:47 PM
We've solved observability for dashboards. Now we need to solve it for agents.

Had a great time writing for the Metadata Weekly newsletter about the shift from Data Trust to Decision Trust:

metadataweekly.substack.com/cp/179547368
From Data Trust to Decision Trust: The Case for Unified Data + AI Observability
Data observability was built to catch errors for humans. Unified observability is built to control risk for autonomous AI.
metadataweekly.substack.com
November 22, 2025 at 1:21 AM
Smart Brevity by Jim VandeHei, Mike Allen, and Roy Schwartz should be mandatory reading before you're allowed to send your 1st email/Slack message.

And it's more vital than ever now, since we upgraded from "human-generated fluff" to "AI-supercharged fluff". 1/2
October 14, 2025 at 9:04 PM
The dbt Labs-Fivetran merger is such a Logan Roy move by Fivetran - they now own all three data transformation tools that emerged from the MDS: dbt, SQLMesh, and SDF (via dbt Labs).
Post-MDS world is full of surprises
October 13, 2025 at 5:26 PM
Excited to share that my masterclass, "Data Modeling for Data Products," is now available on-demand via ModernData101!

If you're a Data PM, Analytics Engineer, or Data Engineer focused on building valuable (& scalable) data products, this session is for you! 1/2
October 11, 2025 at 8:30 PM
Peak Paris is going to a (fantastic) contemporary dance show at a department store - (Babel at Le Bon Marché👌🏼)
September 7, 2025 at 8:41 PM
Very interesting article by @apachedatafusion.bsky.social team on user-defined/custom indexes in Parquet - really surprised that other Parquet readers/writers don't leverage this, given that file pruning remains limited with "vanilla" Parquet in many scenarios.
datafusion.apache.org/blog/2025/07...
Embedding User-Defined Indexes in Apache Parquet Files - Apache DataFusion Blog
datafusion.apache.org
August 16, 2025 at 4:24 PM
Data Espresso #12 is out ☕
This edition covers:
- A modern playbook for data modeling in a product-driven world
- Why we need a "Spotify Wrapped for everything"
- A two-step formula for building data products that actually matter
dataespresso.substack.com/p/espresso-1...
1/4
Espresso #12: Data modeling for data products, a Spotify Wrapped for everything, and building things that matter
A modern playbook for data modeling in a product-driven world, how AI can power a supercharged Spotify Wrapped, and a two-step formula for building valuable data products.
dataespresso.substack.com
August 4, 2025 at 7:51 AM
Data modeling is back (and it's good news!), but we can't just copy-paste the old playbook. Instead, there's a big opportunity to adapt existing data modeling methodologies to today's world: data products, limitless compute, and a big need for speed. 1/2
June 30, 2025 at 9:22 PM
Today's data platforms are powerful, but the actual experience is still clunky (tool hopping, siloed metadata, etc.). The missing piece is the last mile/experience layer – a polished UI/UX layer connecting all the backend systems. A great example is Airbnb's data timeliness UI: shorturl.at/3n5w9
1/5
Visualizing Data Timeliness at Airbnb
by Chris Williams, Ken Chen, Krist Wongsuphasawat, and Sylvia Tomiyama
medium.com
May 31, 2025 at 11:12 PM
This is a great post by the Discord data team about how they augmented several dbt features like materializations and macros.

The "meta" attribute continues to be severely underused by data teams, but they neatly leverage it here for custom versioning.
discord.com/blog/overclo...
Overclocking dbt: Discord's Custom Solution in Processing Petabytes of Data
Explore how Discord supercharged dbt with a tailored solution designed for performance, developer productivity, and data quality.
discord.com
April 29, 2025 at 11:02 PM
Data Espresso #10 is out ☕️
This edition covers:
- If/when you should migrate to Apache Iceberg
- The benefits of revisiting design decisions (and why you should do it often)
- How constraint-heavy environments can foster engineering ingenuity
open.substack.com/pub/dataespr...
1/4
Espresso #10: A new ice(berg) age, revisiting old designs, and thriving on constraints
Hello data friends,
open.substack.com
April 16, 2025 at 5:00 PM
Now that the semantic layer is back in the data space's spotlight, it's definitely worth revisiting adjacent concepts like metric trees. If you're unfamiliar with the term, Trace has a fantastic (and brief) intro that's worth your time: www.hellotrace.io/blog/introdu...
Trace - Introduction to Metric Trees
A proper introduction to metric trees with a real world example
www.hellotrace.io
February 9, 2025 at 3:52 PM
Ever since I started working in data back in 2017, I've heard the same prediction every year: "next year will be the year of streaming!" - years came and went but the much-hyped real-time analytics revolution never materialized. However, I genuinely think this may be THE year - hear me out:
February 3, 2025 at 5:45 PM
Reposted by Mahdi Karabiben
What they don't tell you about build vs buy:
Stop Trying To Schedule A Call With Me
Stop Trying To Schedule A Call With Me - The harassment by SaaS
matduggan.com
January 18, 2025 at 2:40 PM
Great news for dbt users (esp. dbt Core users who haven't gotten a "significant" upgrade in a long while) - many workflows & use cases will become possible or much easier to implement.

Now it's a matter of execution & focusing on the right issues - but data space is starting 2025 on a high note!
January 14, 2025 at 6:14 PM
Earlier this year, I transitioned to Product Management after 7 years in Data Engineering. What surprised me the most when making the switch is the lack of content on navigating career pivots within tech, even though there's an abundance of "tool X vs. Y" content. 1/4
December 30, 2024 at 1:25 PM
Data folks as we join Bluesky en masse this week 🥹
October 27, 2024 at 2:53 PM