Hugo Lu
datajesus.bsky.social
Hugo Lu
@datajesus.bsky.social
CEO of Orchestra - Unified Control Plane for Data Pipelines
Migrating from a legacy system should be done incrementally, but you should get buy-in.

The worst is buying all your tools without a plan.

Look at the guide we wrote for #sqlServer

tinyurl.com/bddvtku5
Top Considerations when considering a migration from SQL Server to Cloud
How to avoid the most common enterprise architecture pitfalls
tinyurl.com
January 4, 2025 at 7:33 AM
Hadoop, Spark and Iceberg are not alternatives. They are the same thing evolving.
#apachespark #apacheiceberg #opentableformat
December 31, 2024 at 10:07 AM
ELT / Data Pipeline architecture for Fabric + databricks
medium.com/@hugolu87/el...

#fabric #msfabric #databricks #elt
ELT with Fabric, Azure and Databricks
Data Pipeline Patterns for 2025 and beyond
medium.com
December 23, 2024 at 10:05 PM
4. Automatically adding tests

This bit is really good as it means you don't need to spend anytime thinking about how to write custom #dbtmacros to see what works

Check out the video on Youtube

www.youtube.com/watch?v=s-Xx...

#dbt #databuildtool #dbtpoweruser
December 18, 2024 at 9:32 AM
2. Auto generation of /yml files.

If like me you find writing yml really terse, then you can automatically generate entire schema using this extension

3. Automatically generating docs

If you also #hate writing documentation then you can rinse someone else's #OpenAI credits
Introduction to dbt Power User. Using dbt-core and not using this, you're missing out. #dbt
A lot of the time, writing dbt / data build tool code can be really arduous, tedious and boring. Nobody wants to be defining custom macro after custom macro and writing terse .yml files just to get by. Fortunately the folks at Altimate.ai have you covered. In this tutorial Hugo Lu shows you how to install dbt power user and how to get started with dbt power user. Specifically, there are a few things we really like 1. It is free (for now) 2. You can leverage the extension to automatically generate documentation 3. You can leverage it to autopopulate schcema 4. You can leverage it to autopopulate tests 5. You can use the API Key method to explore column-level lineage All for free Quite frankly we aren't sure what Altimate's game is here, especially because the extension is free and clearly includes GPT credits under-the-hood. So leverage it while...
www.youtube.com
December 18, 2024 at 9:30 AM
(1/many) If you're not using dbt power user and you use #dbtcore you should be. WHY?

1. COLUMN-LEVEL LINEAGE (free)

You can visualise column-lineage for easy exposition and data re #architecture in the platform easy no problem
Introduction to dbt Power User. Using dbt-core and not using this, you're missing out. #dbt
A lot of the time, writing dbt / data build tool code can be really arduous, tedious and boring. Nobody wants to be defining custom macro after custom macro and writing terse .yml files just to get by. Fortunately the folks at Altimate.ai have you covered. In this tutorial Hugo Lu shows you how to install dbt power user and how to get started with dbt power user. Specifically, there are a few things we really like 1. It is free (for now) 2. You can leverage the extension to automatically generate documentation 3. You can leverage it to autopopulate schcema 4. You can leverage it to autopopulate tests 5. You can use the API Key method to explore column-level lineage All for free Quite frankly we aren't sure what Altimate's game is here, especially because the extension is free and clearly includes GPT credits under-the-hood. So leverage it while...
www.youtube.com
December 18, 2024 at 9:29 AM
#msfabric data architecture for 2025 in this link below. Must read for anyone building data pipelines in the azure stack

www.getorchestra.io/whitepaper/m...
Microsoft Fabric Reference Architecture: MS Fabric in 2025 | Orchestra
ELT using Microsoft Fabric has never been simplier with this standard reference architecture.
www.getorchestra.io
December 14, 2024 at 10:57 AM
At ThoughtSpot Embed - these guys are absolute pros. Driving revenue for businesses through data and embedded #analytics. This is not a fad
December 11, 2024 at 1:39 PM
It is concerning that orchestration often *doesn't* come up when data teams speak to my friends that run smaller ELT companies.

Do you really think you can get away with roguing it out? It's only a matter of time before you get found out as an amateur. #dataorchestration #datang
December 9, 2024 at 1:53 PM
The phrase data observability is meaningless and kinda hard to pronounce.

Whatever happened to just all round decent architecture?

medium.com/@hugolu87/le...

#dataquality #datanegineering
Let’s never use the phrase Data Observability Ever Again
No-one even knows what it is, let alone pronounce it
medium.com
December 8, 2024 at 8:47 PM
Many people that use #dbt don't realise you can prevent having any mission periods in your datasets if you would but bother to write this single test
www.youtube.com/watch?v=e09U...

#dbt #dataquality #orchestra #datajesus #cometojesus
How to PREVENT incomplete data using dbt tests and Orchestra | dbt test tutorial #dataquality
YouTube video by Orchestra
www.youtube.com
December 8, 2024 at 1:14 PM
Data Job Market is carnage now. Why?
- excess supply “data engineering is so hot right not”
- excess supply “people switching during ZIRP as it was easy and well
Paid”
- not enough demand “Data team is a cost centre”
- massive investment in SAAS
What did I miss? #dataengineering #jobmarket
November 27, 2024 at 7:32 PM
Check out how #orchestra is changing the game for data teams. Our most powerful integration yet

medium.com/@hugolu87/ou...

#python #orchestra #dataengineering
#cometojesus
Our most powerful integration yet: native python support
Code based utility and python execution from within orchestra
medium.com
November 27, 2024 at 7:10 PM
What's up it's me data jesus
November 26, 2024 at 1:24 PM