Data Code 101
datacode101.bsky.social
Data Code 101
@datacode101.bsky.social
Data / Software Engineering
Token-Oriented Object Notation (TOON) is a compact, human-readable serialization format designed for passing structured data to Large Language Models with significantly reduced token usage. It's intended for LLM input as a lossless, drop-in representation of JSON data.

#dataengineering #llm
November 6, 2025 at 6:01 AM
RAG Stack

Building with Retrieval-Augmented Generation (RAG) isn't just about choosing the right LLM. It's about assembling an entire stack—one that's modular, scalable, and future-proof.
#ai #rag #dataengineering
October 27, 2025 at 10:36 AM
ETL vs ELT vs EtLT

All three methods begin with Extract (E) and end with Load (L), but the placement of transformation dictates their suitability for different infrastructure, data types, and business needs.

#dataengineering
October 19, 2025 at 4:45 AM
RAG stands for Retrieval Augmented Generation. RAG helps to reduce hallucinations in LLMs by providing them with relevant contexts from external knowledge sources.

Understanding how RAG works from scratch is important for AI/ML Engineers.
#dataengineering #rag #aiengineering #llm
October 17, 2025 at 1:25 PM
Because you tell the system what you want via SQL, there are clauses, the “verbs“ to describe the action you want with the data. This is the order of the physical execution behind the scenes.
October 7, 2025 at 12:21 PM
AI Agent Frameworks

The framework shapes how your agent thinks, acts, and connects to tools and data. LLMs are the brain, frameworks are the wiring connecting different parts.

Image by /in/rakeshgohel01
September 16, 2025 at 3:15 PM
While operational databases are the engines running your day-to-day applications, large-scale analytical systems are designed not for rapid, small transactions, but for complex, large-scale queries and aggregations unlocking insights from vast amounts of historical information.

#DataEngineer #OLAP
September 15, 2025 at 10:46 PM
Undestanding the Types of Databases

Choosing the right database is a critical architectural decision. Each type is a specialized tool designed for a specific job.

Here’s a breakdown of the essentials:
September 15, 2025 at 10:20 PM
Big Data Pipelines Across AWS, Azure & GCP

As data engineers, we often work across different cloud platforms. While the concepts stay the same (ingestion ➝ storage ➝ compute ➝ warehouse ➝ visualization), the tools differ.
September 15, 2025 at 10:07 PM
JavaScript introduced async/await in ECMAScript 2017. It is a syntactic sugar built on top of Promises, to make asynchronous code easier to write/read, avoiding long .then() chains. Standard way to handle asynchronous operations like API calls, file system access, or database queries. #javascript
September 3, 2025 at 1:37 AM
Rust provides async/await for writing efficient, non-blocking code without sacrificing its core principles of safety and performance. It's a key feature for building scalable network services and concurrent applications. #rust
September 3, 2025 at 1:37 AM
Python introduced async and await syntax in version 3.5, building upon the asyncio library. It's crucial for writing high-performance, I/O-bound applications in the Python ecosystem. #python
September 3, 2025 at 1:37 AM
C# was one of the early and influential adopters of the async/await pattern, introducing it in version 5.0 (released in 2012). Its implementation heavily influenced many other languages. It's built on the Task Parallel Library (TPL).
#csharp
September 3, 2025 at 1:37 AM
Kiro vs Cursor: How Amazon’s AI IDE Is Redefining Developer Productivity
#programming #development #software #engineering #ai #vscode #agentic
July 22, 2025 at 8:41 PM
Getting started with parallelizable sum of numbers
February 7, 2025 at 4:54 PM
One Big Table (OBT) is a data modeling approach that emphasizes the use of a single, wide table to store and manage data for analytics.

#dataengineering #datamodeling
January 16, 2025 at 4:43 PM
"AI Engineering" the process of building applications with readily available foundation AI models, explaining what AI engineering is and how it differs from traditional machine learning engineering
January 14, 2025 at 6:01 PM
"Data Mesh" guides practitioners, architects, technical leaders, and decision makers on their journey from traditional big data architecture, data warehouses and lakes to a distributed and multidimensional approach to analytical data management
January 14, 2025 at 5:43 PM
"Designing Data-Intensive Applications" is a practical and comprehensive guide examining the pros and cons of various technologies for processing and storing data, showing how to make full use of data in modern applications.
January 14, 2025 at 5:40 PM
"Fundamentals of Data Engineering" is a practical book about data engineering lifecycle, walking through concepts of data generation, ingestion, orchestration, transformation, storage, and governance that are critical in any data environment regardless of the underlying technology
January 14, 2025 at 5:38 PM
"Data Engineering Best Practices" explore best practices for performance, cost-effectiveness, technology choices and how to avoid common pitfalls.
January 14, 2025 at 5:35 PM
Brief History of Big Data 🧵
October 23, 2024 at 10:44 PM
"Football Analytics with Python and R" by Eric Eager and Richard Erickson. Teaches how to use data science to analyze American football using both Python and R.
#datascience #analytics #python
October 4, 2024 at 6:14 PM
SQL Mindmap by @brijpandeyji
DDL, DML, DCL, GROUPING, ORDERING, FUNCTIONS, WINDOW FUNCTIONS
#sql
October 4, 2024 at 5:55 PM
Timeline of digital transformation

Digital transformation is about the integration of digital technologies into all areas of a business.
#digital #transformation
September 24, 2024 at 3:05 AM