Gabriel, @thedatasitter
banner
thedatasitter.com
Gabriel, @thedatasitter
@thedatasitter.com
Taming Data Tantrums.
Senior Data Engineer @ Caylent (2024 AWS Partner of the Year)
-
subscribe to https://thedatasitter.substack.com/welcome
🇧🇷
-
opinions are my own
How do you guarantee you won't have any bad rows maculating your source of truth?

Your answer is here:
thedatasitter.substack.com/p/netflixs-w...
Netflix's WAP: guarantee data quality [Under The Hood #2]
Write, Audit and Publish your data to minimize those tantrums.
thedatasitter.substack.com
March 11, 2025 at 1:56 PM
Nevertheless, the spirit of the division is the same. You'll have your go-to production data at the end - the gold or analytics layer.

These are supposed to be the most trusted, up-to-date, clean, and good looking tables.
March 11, 2025 at 1:56 PM
AWS names them “raw, staging, and analytics.”

Databricks likes the medallion convention of “bronze, silver, and gold.”

In some scenarios, you would add a layer for events. Some call them “transient,” others “landing.”
March 11, 2025 at 1:56 PM
I tried and failed to study math for ML for 3 years now, and finally now, after treatment started, I can concentrate on these complex topics and actually learn lol
February 27, 2025 at 4:37 PM