Kit Menke
kitmenke.com
Kit Menke
@kitmenke.com
Data Engineering leader in Saint Louis, STL Big Data I.D.E.A. meetup organizer, lifelong learner and teacher. He / him
#dataBS
Pinned
Hello Blue sky! I am an architect and data engineer with a focus on distributed systems and real-time streaming data pipelines. In the past few years I've worked a lot with Spark, Azure, and Databricks!

Outside of work, I organize the STL Big Data I.D.E.A. meetup and do some volunteering.
Reposted by Kit Menke
While AI companies are allowed to slurp everything they want, Quad9 warns that legal fees are drowning DNS resolvers, which are now being targeted by copyright owners to enforce blocks on piracy sites

quad9.net/news/blog/wh...
Quad9 | A public and free DNS service for a better security and privacy
A public and free DNS service for a better security and privacy
quad9.net
November 10, 2025 at 10:53 PM
Any meetup.com organizers that have successfully moved their community to another platform? I would be interested in hearing your experience
September 26, 2025 at 10:20 PM
Conspiracy theory: Databricks is deliberately making your clusters start up super slowly so that you want to pay more to use serverless. #dataBS
August 28, 2025 at 2:13 PM
I've been working on visualizing JOINs for some beginner SQL workshops. Here is LEFT JOIN. Thoughts? #databs youtu.be/ZSxtZAulogo?...
Visualizing a SQL LEFT JOIN
YouTube video by Kit Menke
youtu.be
June 10, 2025 at 11:51 AM
Moving from Azure Data Studio to VSCode but ugh... the VSCode SQL Server extensions are so frustrating to use.
May 21, 2025 at 1:37 PM
Reposted by Kit Menke
Couple of big announcements from @cloudflare.social today for folk in #dataBS:

* Acquisition of Arroyo, launch of Pipelines for streaming ingestion: blog.cloudflare.com/cloudflare-a...
* Launch of R2 Data Catalog—a managed Apache Iceberg catalog for R2 blog.cloudflare.com/r2-data-cata...
Just landed: streaming ingestion on Cloudflare with Arroyo and Pipelines
We’ve just shipped our new streaming ingestion service, Pipelines — and we’ve acquired Arroyo, enabling us to bring new SQL-based, stateful transformations to Pipelines and R2.
blog.cloudflare.com
April 10, 2025 at 2:50 PM
Databricks recently changed the default notebook format from "source" (.py, .sql, .scala) to IPYNB which seems to indicate they will be getting rid of the source format. IMO, the ipynb format brings a few issues like difficult diffs and the potential to leak data learn.microsoft.com/en-us/azure/...
December 2024 - Azure Databricks
December 2024 release notes for new Azure Databricks features and improvements.
learn.microsoft.com
February 19, 2025 at 8:07 PM
Reposted by Kit Menke
I found this while looking through some scratch notes. I don't now recall what the context was, but it's an interesting thought on the evolution of the data warehouse. (Though there is an equivocation imbedded in this history) #databs
February 5, 2025 at 5:45 PM
Do you version your data assets? Or is there only the current version of a database table? What about the table definition? #dataBS
February 4, 2025 at 2:57 PM
An agile ceremony / rite of passage nobody mentions: arguing about story points and what they mean.
January 24, 2025 at 5:06 PM
Reposted by Kit Menke
1. Impact. How much revenue does my work protect or generate?

2. Quality. Does my work meet or exceed customer expectations?

3. Efficiency. Reward making the right buy versus build decision.

4. Reusability. How do others leverage my work?

5. Supportability. How much work do I create for others?
a question for the people who write code for money:

if you could wave a magic wand and have your performance/promotability measured on any 5 metrics of your choice, what would those metrics be?
January 23, 2025 at 4:45 PM
Reposted by Kit Menke
I'm out walking and had some thoughts about data and fun stuff and mental health that I wanted to share.

#dataBS
January 22, 2025 at 1:23 PM
Reposted by Kit Menke
Gahhh it’s time! @data-dragoness.bsky.social devUp call for speakers! Let’s take over with the #PowerPlatform and #MicrosoftFabric topics!

For anyone who’s in the middle west, let’s do this!

sessionize.com/dev-up-2025
dev up 2025: Call for Speakers
The 2025 dev up conference is being held in St. Louis, Missouri from August 6-8, 2025. We are excited to be back and we are putting out the call to ...
sessionize.com
January 14, 2025 at 8:43 PM
Great breakdown of the new S3 Tables feature that leverages Apache Iceberg. Including an explanation of the costs... which are complicated. #dataBS
bigdata.2minutestreaming.com/p/meet-your-...
meet your new data lakehouse: S3 Iceberg Tables
S3 Tables and S3 Metadata are two brutal new features that compete with common Apache Iceberg Lakehouse architectures
bigdata.2minutestreaming.com
December 6, 2024 at 8:28 AM
Reposted by Kit Menke
The more I think about yesterday's announcement about Amazon S3 Tables, the more I think that it changes things a great deal.

The gravity of data has shifted from the warehouse to cloud storage... but is there really a difference any more?

🧵1/n

www.businesswire.com/news/home/20....
Amazon S3 Expands Capabilities with Managed Apache Iceberg Tables for Faster Data Lake Analytics and Automatic Metadata Generation to Simplify Data Discovery and Understanding
At AWS re:Invent, Amazon Web Services, Inc. (AWS), an Amazon.com, Inc. company (NASDAQ: AMZN), today announced new Amazon Simple Storage Service (Amaz
www.businesswire.com
December 4, 2024 at 10:50 AM
Arch Data Network is hosting an event around Apache Airflow and how it is being used. This Thursday, December 5th from 5:30 - 7:30 pm in Creve Coeur
www.linkedin.com/events/archd...
Arch Data Network December 5th Event | LinkedIn
Apache Airflow is popular because it provides a powerful, scalable, and flexible solution for orchestrating complex data workflows and automating processes across diverse environments. Join us as we d...
www.linkedin.com
December 3, 2024 at 12:28 AM
I made a Starter Pack for people in the Saint Louis, Missouri area who are doing cool stuff in Data Engineering, Data Analytics, or Data Science. If you're doing data in STL let me know! #datasky #dataBS

go.bsky.app/SZUtRw3
November 20, 2024 at 11:07 PM
In two weeks, the St. Louis Big Data I.D.E.A. meetup is hosting @chad-isenberg.bsky.social to give an overview on dbt, alternatives, and the future of "the last mile" in data management. Beginners welcome!

🗓️ When: December 4, 2024 @ 5:30 PM
📍Where: Virtually on Zoom

RSVP below! #dataBS #datasky
[VIRTUAL] A Brief Introduction to dbt, Wed, Dec 4, 2024, 5:30 PM | Meetup
In this talk, we'll cover what dbt is, why it's useful, alternatives, and the future of "the last mile" in data management. I'll assume folks have no knowledge of dbt and m
www.meetup.com
November 20, 2024 at 5:56 PM
Reposted by Kit Menke
Foursquare just open sourced their 100 million place point of interest dataset! Some notes on poking around with it using DuckDB (it's Parquet files on S3) simonwillison.net/2024/Nov/20/...
Foursquare Open Source Places: A new foundational dataset for the geospatial community
I did not expect this! > [...] we are announcing today the general availability of a foundational open data set, Foursquare Open Source Places ("FSQ OS Places"). This base layer …
simonwillison.net
November 20, 2024 at 6:08 AM
Reposted by Kit Menke
New feed! Add #dataBS or #databsky to your post, and it'll get ingested into this custom Data BS feed, which looks back over 7 days of posts.
October 29, 2024 at 6:15 PM
Reposted by Kit Menke
Some work that I have been involved in the last year. I hope you like the blogpost from our lead, Soumaya as its a very interesting solution. Not all problems are nails to the hammer of Spark :)
ministryofjustice.github.io/data-and-ana...
Building a transaction data lake using Amazon Athena, Apache Iceberg and dbt
How we leveraged Amazon Athena, along with the Apache Iceberg table format and the dbt SQL management framework, to build robust, scalable and maintainable ELT (extract, load, transform) pipelines.
ministryofjustice.github.io
November 17, 2024 at 1:07 AM
Reposted by Kit Menke
In June Elon posted this graph of the rate of likes on X. It doesn't have a unit on the y axis, but it's plausible to assume that it's events/sec. If that is true, X in June was handling about 20k likes/sec. For comparison, Bluesky is now handling about 700 likes/sec during the busy part of the day.
November 16, 2024 at 10:46 PM
Thinking about creating a STL Data starter pack for those doing cool stuff in data engineering, data analytics, and data science...
November 16, 2024 at 1:43 PM
Reposted by Kit Menke
New post up! ✨

Exploring AT Protocol with Python to visualize the #databs social graph!

davidgasquez.com/exploring-at...

Took less than 1 hour to get the data and plot it. Amazing what you can do with open APIs and great SDKs!
November 14, 2024 at 1:39 PM
I'm all set up with SAP PowerDesigner. Next step... World domination.
November 15, 2024 at 3:33 AM