Lightnews — Scholar-powered news

Kit Menke

@kitmenke.com

270 followers 370 following 46 posts

Data Engineering leader in Saint Louis, STL Big Data I.D.E.A. meetup organizer, lifelong learner and teacher. He / him
#dataBS

Posts Replies Media Videos

Kit Menke

@kitmenke.com

Perhaps using unnest?
select unnest(value, recursive := true) from read_json('~/Data/example.json')

November 6, 2025 at 8:02 PM

Kit Menke

@kitmenke.com

My blogging motivation has declined a lot over the years... then the endless Hugo breaking changes pretty much killed it for good. Do you know how much work it would be to convert a Hugo blog over to Zola?

September 1, 2025 at 4:56 AM

Kit Menke

@kitmenke.com

I met my wife on xanga! ♥️

April 25, 2025 at 8:52 PM

Kit Menke

@kitmenke.com

Chispa has good diffs for PySpark dataframes
github.com/MrPowers/chi...

GitHub - MrPowers/chispa: PySpark test helper methods with beautiful error messages

PySpark test helper methods with beautiful error messages - MrPowers/chispa

github.com

February 25, 2025 at 1:26 PM

Kit Menke

@kitmenke.com

I had an HDMI KVM but it was still annoying to switch back and forth. Plus I wanted to use the full resolution at 144Hz on my gaming PC. Now I just have a big desk with separate keyboards/mice/monitors.

February 5, 2025 at 8:59 PM

Kit Menke

@kitmenke.com

Yes, I'm working on this right now and talking about how we can potentially "upgrade" some of the dimensions without breaking everything. 🙃

February 5, 2025 at 8:50 PM

Kit Menke

@kitmenke.com

Thanks for the input and I agree... Right now I'm battling a mono-repo used by a big team with limited git knowledge and no tooling. Choosing a tool like dbt/flyway/liquibase could help force some standardization.

February 4, 2025 at 8:40 PM

Kit Menke

@kitmenke.com

Do you ever feel like it is difficult to keep them in sync with what is deployed to the database? Or with many people working in the same repo?

February 4, 2025 at 6:46 PM

Kit Menke

@kitmenke.com

Is keeping the table definition valuable for only certain databases? For example in Databricks you can easily get the definition and there aren't any indexes to store. Compared to SQL Server (or similar) where it is difficult to figure out what was deployed.

February 4, 2025 at 4:24 PM

Kit Menke

@kitmenke.com

You did this only for certain breaking changes right? For example - meaning of the data in a column changed or columns removed. How did you maintain two separate versions of the schema?

February 4, 2025 at 4:22 PM

Kit Menke

@kitmenke.com

In my experience I'm seeing companies using Spark switch from Scala to Python for two reasons: Python has an easier learning curve and Scala devs are much harder to find.

January 14, 2025 at 6:43 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news