Kit Menke
kitmenke.com
Kit Menke
@kitmenke.com
Data Engineering leader in Saint Louis, STL Big Data I.D.E.A. meetup organizer, lifelong learner and teacher. He / him
#dataBS
Perhaps using unnest?
select unnest(value, recursive := true) from read_json('~/Data/example.json')
November 6, 2025 at 8:02 PM
My blogging motivation has declined a lot over the years... then the endless Hugo breaking changes pretty much killed it for good. Do you know how much work it would be to convert a Hugo blog over to Zola?
September 1, 2025 at 4:56 AM
I met my wife on xanga! ♥️
April 25, 2025 at 8:52 PM
Chispa has good diffs for PySpark dataframes
github.com/MrPowers/chi...
GitHub - MrPowers/chispa: PySpark test helper methods with beautiful error messages
PySpark test helper methods with beautiful error messages - MrPowers/chispa
github.com
February 25, 2025 at 1:26 PM
I had an HDMI KVM but it was still annoying to switch back and forth. Plus I wanted to use the full resolution at 144Hz on my gaming PC. Now I just have a big desk with separate keyboards/mice/monitors.
February 5, 2025 at 8:59 PM
Yes, I'm working on this right now and talking about how we can potentially "upgrade" some of the dimensions without breaking everything. 🙃
February 5, 2025 at 8:50 PM
Thanks for the input and I agree... Right now I'm battling a mono-repo used by a big team with limited git knowledge and no tooling. Choosing a tool like dbt/flyway/liquibase could help force some standardization.
February 4, 2025 at 8:40 PM
Do you ever feel like it is difficult to keep them in sync with what is deployed to the database? Or with many people working in the same repo?
February 4, 2025 at 6:46 PM
Is keeping the table definition valuable for only certain databases? For example in Databricks you can easily get the definition and there aren't any indexes to store. Compared to SQL Server (or similar) where it is difficult to figure out what was deployed.
February 4, 2025 at 4:24 PM
You did this only for certain breaking changes right? For example - meaning of the data in a column changed or columns removed. How did you maintain two separate versions of the schema?
February 4, 2025 at 4:22 PM
In my experience I'm seeing companies using Spark switch from Scala to Python for two reasons: Python has an easier learning curve and Scala devs are much harder to find.
January 14, 2025 at 6:43 PM