Pipeline To Insights
banner
pipeline2insights.bsky.social
Pipeline To Insights
@pipeline2insights.bsky.social
It started by two Data Engineers, shares knowledge, tutorials, and experiences to help others grow in the data industry. We focus on learning, collaboration, and making complex data/AI topics easy to understand. Substack: pipeline2insights.substack.com
Reposted by Pipeline To Insights
Mistakes help us grow, whether ours or others'. In data, small errors can cause big issues like broken pipelines or high costs. These lessons aren’t just for data engineers, they benefit anyone working with data.
#datasky
#dataBS
Common Data Engineering mistakes and how to avoid them
From broken pipelines to unexpected cloud costs, learn from real-world mistakes and lessons to level up your data engineering skills.
pipeline2insights.substack.com
March 28, 2025 at 3:33 AM
Mistakes help us grow, whether ours or others'. In data, small errors can cause big issues like broken pipelines or high costs. These lessons aren’t just for data engineers, they benefit anyone working with data.
#datasky
#dataBS
Common Data Engineering mistakes and how to avoid them
From broken pipelines to unexpected cloud costs, learn from real-world mistakes and lessons to level up your data engineering skills.
pipeline2insights.substack.com
March 28, 2025 at 3:33 AM
Data Compression in SQL
In this post, we’ll explore:
- What is Data Compression?
- Benefits of Data Compression in SQL.
- Types of Compression in SQL.
- Comparison of SQL Compression Techniques.
- When to Use Compression and When to Avoid.
#databs #datasky
Data Compression in SQL
How to Store More and Query Faster in SQL
pipeline2insights.substack.com
March 15, 2025 at 3:08 AM
What is Zero-ETL ? What it isn’t ?
Read more here :
open.substack.com/pub/pipeline...

#dataBS
#datasky
February 4, 2025 at 9:59 PM
In this post, we will cover:

- What is Data Vault 2.0
- Common Data Vault interview questions
- Bridging the fundamental models to Data Vault
- A Case Study: Converting a Dimensional Model to a Data Vault

#dataBS #datasky
Week 6/33: Data Modelling for Data Engineering Interviews (Part #3)
What is Data Vault 2.0 and its role in Data Engineering Interviews
open.substack.com
January 18, 2025 at 10:34 AM
Explore key data modelling concepts for databases and data warehouses, including:
- Normalisation vs. Denormalisation
- 3NF
- Dimensional Modeling
- Star vs. Snowflake Schema comparisons.
#databs
#datasky
Data Modelling Fundamentals: Normalisation, 3NF and Dimensional Modelling
Normalisation, 3NF, and dimensional modelling, with insights into Star and Snowflake schemas for efficient database and warehouse design
open.substack.com
January 11, 2025 at 12:16 PM
11 Storage Formats for Data Engineers
Efficient data starts with the right storage format. Explore 11 formats every data engineer should know to match workloads and scale seamlessly.
Highlights: Row & Columnar, Key-Value, Document, Graph, Time-Series, Hybrid.

#databs
#datasky
11 Storage Formats for Data Engineers
How to leverage storage formats for efficient and scalable data systems
pipeline2insights.substack.com
January 4, 2025 at 10:31 PM
Data ingestion with dlt and Dagster: An end-to-end pipeline tutorial:

Curious like us to see what people are sharing with #dataBS and #datasky? Check out this post to learn how to do it using dlt!"
@matthausk.bsky.social
@datateam.bsky.social
@hgeren.bsky.social
@hopefanhe.bsky.social
#dlt
Data ingestion with dlt and Dagster: An end-to-end pipeline tutorial
Ingest Data from Bluesky API to AWS S3 Using dlt and deploy it on Dagster in Just 15 Minutes.
open.substack.com
December 19, 2024 at 11:00 AM
Week 6 of '100 Days of SQL Optimisation':
Focused on DuckDB, leveraging columnar storage, sorted data, temp tables, Parquet, and optimal data types to boost efficiency. See how in-memory execution and smart structures enhance query performance!

@duckdb.org
#dataBS
#datasky
#duckdb
Week #6: 100 Days of SQL Optimisation
Exploring DuckDB and Its Capabilities
open.substack.com
December 17, 2024 at 1:24 PM
Protect data with encryption, access controls, and monitoring. Safeguard credentials, apply least privilege, store only essential sensitive data, and ensure cloud security with IAM and encryption. Build a culture of security beyond compliance.
@joereis.bsky.social
Security Fundamentals for Data Engineers
The Role of Security in the Data Engineering Lifecycle
open.substack.com
December 11, 2024 at 4:28 AM
We are starting a 32-week Data Engineering Interview Guide program, covering everything from fundamentals to advanced topics, with sessions every Saturday.
Do you think we're missing any critical topics? We're curious about your opinions😊
#dataBS
#datasky
Week 0/32 - A Comprehensive Data Engineering Interview Preparation Guide
Join us every Saturday on This New Journey
open.substack.com
December 8, 2024 at 11:06 AM
As a Data Engineer, understanding the data storage lifecycle and data retention policies is critical for designing efficient, cost-effective, and compliant data systems.
@joereis.bsky.social
#dataBS #datasky

substack.com/@pipeline2in...
December 4, 2024 at 12:11 PM
In our new post, we've covered 10 of the most popular data pipeline design patterns.

We’d love to hear your thoughts. For more details, please check out the full post created by (@hgeren.bsky.social and @hopefanhe.bsky.social ): open.substack.com/pub/pipeline...

#dataBS #datasky
10 Pipeline Design Patterns for Data Engineers
How to leverage Design Patterns for scalable and efficient data pipelines
open.substack.com
December 3, 2024 at 10:19 AM
Discover how dlt simplifies data ingestion.
Learn its origins and real-world use cases. Follow a step-by-step guide to build your first pipeline and join the growing dlt community!
@matthausk.bsky.social
@datateam.bsky.social
@hgeren.bsky.social
@hopefanhe.bsky.social

#dataBS #datasky
Introduction to data load tool (dlt): A Python Library for Simple Data Ingestion
Discover the basics of dlt and its role in modern data engineering workflows
open.substack.com
December 1, 2024 at 10:44 AM
Hi, wishing everyone a great Thanksgiving!

Recently we wrote about how SQL queries are executed behind the scenes.

If you are interested, check out our post: open.substack.com/pub/pipeline...

#dataBS #datasky
November 28, 2024 at 12:23 PM
Reposted by Pipeline To Insights
Just joined and heard #dataBS and #datasky are where the cool kids hang.

Wanted to introduce our blog where we regularly write about Data Engineering concepts, news, and tools.

pipeline2insights.substack.com
November 6, 2024 at 12:49 PM
Storage is at the heart of Data Engineering.
In this post, we explore the hierarchy of data storage from the ground up, drawing inspiration from Fundamentals of Data Engineering by
@joereis.bsky.social
and Matt Housley, as well as insights from the DE Professionals on Coursera.
#dataBS #datasky
Storage Fundamentals For Data Engineers
Why organised and durable storage is the cornerstone of Data Engineering?
open.substack.com
November 26, 2024 at 10:59 AM