Lightnews — Scholar-powered news

Jason Miles

@edudatasci.net

The State of #MicrosoftFabric's #DataPlatform before Ignite 2025. Five themes to watch over the next two weeks.

Pre‑Ignite 2025: The State of the Data Platform

What to watch in the eight days before Ignite—and how to translate the signals into architecture bets that age well.

edudatasci.net

November 11, 2025 at 3:02 PM

Jason Miles

@edudatasci.net

#Agile isn’t “no plan”; it’s a thin plan + thick feedback. When dates, deliverables, #objectives, and #KPIs lock together, teams ship faster and prove impact. If your #roadmap can’t show that chain in one page, you’re sprinting in circles.

Agile Needs a Spine: Aligning Dates, Deliverables, Objectives, and KPIs

A recent conversation with a colleague reminded me of how important applying structure to an agile project team is. Most industries can't take, to quote the old Blizzard line, the option to release it "when it's ready," because they've made commitments to customers or other parts of the business. Agile frees teams from big-batch planning, but it doesn’t free them from consequences.

edudatasci.net

November 10, 2025 at 3:02 PM

Jason Miles

@edudatasci.net

Schema change isn’t a failure to control—it’s reality to choreograph. Use Fabric’s Medallion pattern to absorb change in Bronze, productize it in Silver, and deliver confidence in Gold. Governance and domains make it boring—and that’s the point. #SchemaEvolution #Lakehouse #DataProducts #DataGoverna

Making Schema Change Boring: A Short History—and How Microsoft Fabric’s Medallion Lakehouse Bakes It In

Schema changes have always been risky because a schema isn’t just columns—it’s the interface between data producers and data consumers. Historically, that interface was rigid, which made any change expensive. Modern lakehouse design solves the problem structurally: a Medallion architecture separates where variation is tolerated (Bronze) from where commitment is made (Silver) and relied upon (Gold). In Microsoft Fabric, those roles map cleanly to Lakehouse, Warehouse, and Power BI’s semantic layer, with governance and domain‑oriented (data‑product) design tying it all together.

edudatasci.net

November 7, 2025 at 3:06 PM

Jason Miles

@edudatasci.net

Enabling a #CitizenData program requires not just developing an understanding of operational organizations becoming #DataProductOwners, but of raising the #DataLiteracy levels of your organization so that #CitizenDataAnalysts can make use of the data you're providing them. Building data literacy…

Data Literacy, Citizen Analysis, and the Shift to a Data‑Enabled Culture

I’ve never been fond of the phrase data‑driven. It can imply that people should surrender the wheel to whatever the chart says. I prefer data‑enabled: a culture where evidence is visible, disputable, and useful—where humans steer and data is the headlight, not the driver. That shift doesn’t start with a platform; it starts with literacy, and it grows when more people can do a little analysis for themselves.

edudatasci.net

November 6, 2025 at 4:07 PM

Jason Miles

@edudatasci.net

If your software keeps tripping over budgets, reviews, and handoffs, the problem isn’t “process”—it’s habitat. Design for the organizational system of systems your code must live in, and let Conway’s Law work for you, not against you. #SystemOfSystems #BusinessArchitecture #ConwaysLaw #OrgDesign

Beyond Boxes and Lines: Designing for the System of Systems Your Software Must Live In

Architecture diagrams look clean—until they collide with the place your system actually lives. That place is not just “production.” It’s your organization: a dense system of systems made of teams, processes, policies, budgets, and tools. The primary reason to understand the organization in which a system will reside is precisely these systems. Your software must integrate with them as surely as it integrates with databases and queues.

edudatasci.net

November 5, 2025 at 3:15 PM

Jason Miles

@edudatasci.net

Wiegers and Beatty's Software Requirements (3rd Edition) is one of my favorite software books, and one that I've bought more than my fair share of copies of, because I always end up giving copies away to junior developers. in this #BookReview, I tell you a little bit of why I consider this book to…

Review: Software Requirements (3rd Edition) — the best light for a “dark art”

If software requirements really are a “black art,” Software Requirements (3rd Edition) is the field guide that turns on the lights. Karl Wiegers and Joy Beatty’s update for Microsoft Press remains the most practical, end‑to‑end treatment of requirements work I’ve seen—thorough enough for architects and business analysts, yet approachable enough to hand to a junior developer or data engineer without scaring them off.

edudatasci.net

November 4, 2025 at 4:07 PM

Jason Miles

@edudatasci.net

If your #DataStrategy starts with “ingest everything,” you’re optimizing for storage, not #Decisions.
The real change happens when you build data products that deliver clear outcomes — one decision, one promise, one #MeasurableResult at a time.
Stop managing ponds. Start shipping #DataProducts.

Build Products, Not Ponds

If you agree that data should be judged by the decisions it improves, the way you build with data changes. You stop trying to pour every table from every system into one giant lake and assume value will appear later. Instead, you ship small, finished data products that help with one decision at a time. The first approach optimizes for storage.

edudatasci.net

November 3, 2025 at 3:04 PM

Jason Miles

@edudatasci.net

Now that I've talked about #Releases and why they matter in #DataEngineering, it's probably time to look under the bed for the boogeyman that so many #Agile projects struggle with: #Requirements. #SoftwareRequirements make what you're building tangible, and form the blueprint that you're building…

Releases Imply Requirements

In a recent post, I argued that a real release is a declaration—a line in the sand that says, this is the version we stand behind. A declaration begs a follow‑up: what exactly are we declaring? The honest answer is: requirements. A release without requirements is just a pile of diffs; a release grounded in requirements is a promise we can audit, test, and keep.

edudatasci.net

October 30, 2025 at 2:00 PM

Jason Miles

@edudatasci.net

#MSFabric is still developing better #CICD and #Release tools, but there are already strong capabilities ready to be leveraged. This is a quick guide to developing a strong CICD and Release pipeline in you MS Fabric projects.

Releases and CI/CD in Microsoft Fabric — with Variable Libraries That Keep Meaning Stable

I keep saying the quiet part out loud: a modern warehouse ships meaning and trust, not just tables. If meaning changes invisibly, trust evaporates. Releases, Release Flow, and CI/CD in Microsoft Fabric are how you move quickly and keep confidence—by making change observable, reversible, and governed. Fabric’s Variable Library and a deliberate, database‑level metadata library are the glue that make this work day to day.

edudatasci.net

October 24, 2025 at 2:11 PM

Jason Miles

@edudatasci.net

Thinking more about data $ProjectManagement recently, I've been feeling like I was missing a good friend from my early days working on programming: #Releases. In data, we're starting to get much better about applying #CICD ideas and making sure that we have strong, testable processes, but we also…

Why We Still Need Real Releases in Data and Analytics

In an era where everything markets itself as “continuous”—continuous integration, continuous delivery, continuous retraining—it can feel quaint to talk about releases. But if we care about reliability and governance, we should talk about them more, not less. A true software‑style release is not nostalgia; it’s a commitment device. It’s the point where we say: this is the version we stand behind…

edudatasci.net

October 23, 2025 at 2:01 PM

Jason Miles

@edudatasci.net

A surprisingly enduring alternate #DataWarehouse #DataArchitecture is #DataVaulthashtag/DataVault" class="hover:underline text-blue-600 dark:text-sky-400 no-card-link">#DataVault. Designed to resolve some of the problems of traditional models in the SQL focused past, many of the ideas of #DataVault have leaked out and become mainstream, and where Data Vault's strengths lie, its still a strong…

Data Vault, Practically: Why It Exists, How It’s Built, and What 2.1 Changes

Modern data platforms live in tension: Source systems evolve faster than dimensional models can absorb. Audit and lineage are mandatory, but teams still need velocity. Cloud lakehouses, streaming, and domain ownership do not slot neatly into yesterday’s warehouse playbooks. Data Vault is a response to those pressures. It is both a modeling approach and a delivery method designed to (1)

edudatasci.net

October 22, 2025 at 2:06 PM

Jason Miles

@edudatasci.net

I've talked before about how trust is the ultimate currency of #DataTeams. One of the key ways we build that trust is to have strong, reliable #Testing protocols as a part four #DataEngineering practice. By passing tests, we not only assure ourselves, but also our customers and stakeholders that…

Testing Like We Mean It: Bringing Software‑Grade Discipline to Data Engineering

I like to say that the first product of a data team isn’t a table or a dashboard—it’s trust. Trust is built the same way in data as it is in software: through tests that catch regressions, encode intent, and make change safe. If pipelines are code, then they deserve the same rigor as code. That means unit tests you can run in seconds, integration tests that respect the messy edges of reality, comprehensive tests that exercise the platform end‑to‑end, and user acceptance testing that proves the system answers the questions people actually have.

edudatasci.net

October 20, 2025 at 2:05 PM

Jason Miles

@edudatasci.net

In #MSFabric, we've been doing things the way we always have because it was supported, but with the latest developments in mirroring and shortcuts, #DataArchitects have new levers to use to reduce the costs of a data architecture. #DataEngineers can now use Bronze as a live layer, while archiving…

Bronze Is Live Now: what Mirroring + Shortcuts really change about cost, archives, and getting to Silver

For years, “Bronze” quietly became a parking lot for periodic snapshots: copy a slice from the source every hour/day, write new files, repeat. It worked, but it was noisy and expensive—lots of hot storage, lots of ingest compute, and a tendency to let “temporary” landing data turn into de‑facto history. Fabric upends that with two primitives that encourage Zero Unmanaged…

edudatasci.net

October 17, 2025 at 2:03 PM

Jason Miles

@edudatasci.net

I’ve written a lot about #ImprovementScience as the engine of practical change. This post moves up a level: how organizations—of any type, not just public institutions—build the chassis around that engine so learning turns into durable, mission‑aligned #Innovation. This #GovernedInnovation makes it…

Governed Innovation: Turning Learning Loops into Enterprise Strategy

Governance, done well, accelerates innovation. That sounds counterintuitive because “governance” often conjures gatekeeping and delay. But in complex systems, enabling constraints—clear aims, decision rights, evidence standards, and risk guardrails—reduce thrash. They let teams move faster with less politics, less ambiguity, and fewer expensive reworks. Put simply: Governed innovation = purposeful exploration + disciplined decisions + explicit guardrails. Purposeful exploration means we start from outcomes the organization actually cares about (growth, safety, quality, equity, cost-to-serve) and frame hypotheses against those aims.

edudatasci.net

October 16, 2025 at 2:02 PM

Jason Miles

@edudatasci.net

With the talk about #DataMesh as a #DataArchitecture, it can be very easy to forget that it's not, or at least isn't only a data architecture. It's actually a particular kind of #DataStrategy, one that integrates distributed #DataGovernance at its very core.

No Governance, No Mesh: Why Compatibility Is the Currency of Data Products

I love the promise of data mesh: push data ownership to the edges, let domain teams ship data as products, and watch the organization move faster. But here’s the unglamorous truth we keep repeating in classrooms and boardrooms: a mesh without strong, distributed data and analytics governance is just a tangle. Autonomy without agreed‑upon rules yields incompatible data products, brittle integrations, and an ever‑growing integration tax.

edudatasci.net

October 15, 2025 at 2:03 PM

Jason Miles

@edudatasci.net

In #MSFabric, one of my favorite new technologies are #MaterializedLakeViews. MLVs are currently in preview, but you can already see the potential to completely change the way #DataEngineers and #DataArchitects interact with and build multi-layer data architectures. The idea of #ZeroUnmanagedCopies…

Materialized Lake Views (MLVs) in Microsoft Fabric

A Materialized Lake View (MLV) is a table in your Fabric lakehouse that’s defined by a SQL query and kept up‑to‑date by the service. You write one CREATE MATERIALIZED LAKE VIEW … AS SELECT … statement; Fabric figures out dependencies, materializes the result into your lakehouse, and refreshes it on a schedule. Today, MLVs are in preview, SQL‑first (Spark SQL), and designed to make Medallion layers (Bronze → Silver → Gold) declarative instead of hand‑assembled pipelines.

edudatasci.net

October 14, 2025 at 2:04 PM

Jason Miles

@edudatasci.net

Today, for #FabricFriday, I'm taking a deep dive into one of the most promising features in the #MSFabric Lakehouse. That's the Delta Change Data Feed. Right now, #DataEngineers can only pull data from traditional delta files written in traditional ways, but one can see a future where mirrored…

The Microsoft Fabric Delta Change Data Feed (CDF)

In Microsoft Fabric you’re sitting on top of Delta Lake tables in OneLake. If you flip on Delta Change Data Feed (CDF) for those tables, Delta will record row‑level inserts, deletes, and updates (including pre‑/post‑images for updates) and let you read just the changes between versions. That makes incremental processing for SCDs (Type 1/2) and Data Vault satellites dramatically simpler and cheaper because you aren’t rescanning entire tables—just consuming the “diff.” Fabric’s Lakehouse fully supports this because it’s natively Delta; Mirrored databases land in OneLake as Delta too, but (as of September 2025) Microsoft hasn’t documented a supported way to

edudatasci.net

October 10, 2025 at 2:04 PM

Jason Miles

@edudatasci.net

Building on the ideas of the #InformationGovernance stack, I'm going to talk about the least talked about part of information management, and a critical component of a strong #Innovation strategy. If you don't agree on how you measure your results, how can you know where you'll have #Impact?

Analytics Governance: the Missing Middle of the Information Governance Stack

Most organizations have matured data governance (quality, ownership, catalogs) and are racing to formalize AI governance (risk, bias, safety, model monitoring). Application governance (SDLC, access, change control) keeps production systems stable. But the layer where business decisions actually touch numbers—analytics—often sits in a gray zone. KPI definitions live in wikis, dashboards implement subtle variations of the “same” metric, and spreadsheets quietly fork the math.

edudatasci.net

October 10, 2025 at 2:08 AM

Jason Miles

@edudatasci.net

Many #DataAnalysts get frustrated with #StarSchema, especially when they change the way they're used to slicing and aggregating their data. "I'm used to having everything ready to go in my visualization tool, this is so much harder, now I need to think about how all these aggregates work together,…

Why Star Schemas Make Analysts Faster (and Happier)

If you live in spreadsheets or SQL all day, the “one big table” (OBT) feels like home. Everything you need is right there: one row per thing, a column for every attribute, and no joins to worry about. It’s a great way to explore data fast—until it isn’t. This post explains, in plain language, why the star schema pays you back every day you analyze data, and how it keeps the speed you love without the headaches you’ve learned to live with.

edudatasci.net

October 8, 2025 at 2:10 PM

Jason Miles

@edudatasci.net

In a previous post, I talked about a revelation I had with regard to the way we could architect #DataProducts in a more efficient way in #MSFabric. The more I thought about this, the more I thought that this change would have (and is currently having) a profound effect on the way we structure our…

A New Paradigm For Data Teams: The Changing Role of the Data Visualization Engineer

When teams build warehouses the old way—source → bronze → silver → gold → semantic—visualization and semantic specialists are invited in at the end. Their job looks reactive: wire up a few visuals, name some measures, make it load fast enough. They inherit whatever the pipeline produced, then try to make meaning out of it. The failure mode is predictable: pixel‑perfect charts sitting on semantic quicksand, with definitions that shift underfoot and performance that depends on structures no one designed for the questions at hand.

edudatasci.net

October 6, 2025 at 2:03 PM

Jason Miles

@edudatasci.net

#MSFabric #RealTimeIntelligence is another place where the announcements at #FabConEurope came in at a surprising pace. In this post, I do a deep dive into the entire subsystem, and call out what's new. This is another valuable #DataEngineering tool, and a big part of the #DP700.

FabCon Feature: Fabric Real‑Time Intelligence

Real‑Time Intelligence (RTI) is the part of Fabric that treats events and logs as first‑class citizens: you connect live streams, shape them, persist them, query them with KQL or SQL, visualize them, and trigger actions—all without leaving the SaaS surface. Concretely, RTI centers on Eventstream (ingest/transform/route), Eventhouse (KQL databases), Real‑Time Dashboards / Map, and Activator (detect patterns and act). That tight loop—capture → analyze → visualize/act—now covers everything from IoT telemetry to operational logs and clickstream analytics.

edudatasci.net

October 3, 2025 at 2:02 PM

Jason Miles

@edudatasci.net

We are starting to see discussion about the concept of #InformationGovernance, and there's been a lot of discussion abut how it's the same as #DataGovernanceag/DataGovernance" class="hover:underline text-blue-600 dark:text-sky-400 no-card-link">#DataGovernance. I see them as two separate but related disciplines, with information governance combining the disciplines of #DataGovernance, #AiGovernance,…

Information Governance: The Backbone That Unifies Data, AI, Applications, and Analytics

Information governance (IG) is the strategy, accountability, and control system for how an organization collects, classifies, uses, protects, shares, retains, and disposes of information across its entire lifecycle. It is: Scope‑wide: Covers structured data, unstructured content, model artifacts, code, dashboards, and records (including legal/records management and privacy). Lifecycle‑aware: From intake and creation → active use → archival → retention/disposition and legal holds.

edudatasci.net

October 2, 2025 at 2:11 PM

Jason Miles

@edudatasci.net

Defining terms is important, but in #DataArchitecture, we're often moving so fast that we're not thinking about how we got here. In #DataWarehouses, the two names that stand above almost all of the rest (except perhaps Codd) are Kimball and Inmon, and we use the distilled learnings form both of…

Baselines Over Buzzwords: From Warehouse to Lakehouse

If you’ve built data systems long enough, you’ve lived through at least three architectural moods: the tidy certainty of Kimball and Inmon, the anarchic freedom of “throw everything in the data lake to ingest quickly,” and today’s lakehouse, which tries to keep our speed without losing our sanity. I've always cared less about labels and more about baselines—clear, durable expectations that make change safe.

edudatasci.net

October 1, 2025 at 2:02 PM

Jason Miles

@edudatasci.net

A conversation last week with @ErinSanders got me thinking about how some of the #DataEngineering challenges we face come from the process that we use to develop #DataProducts. #MSFabric allows us to literally #BeginWithTheEndInMind and develop from the goal we are trying to reach, rather than…

A New Paradigm For Data Teams: The real bottleneck isn’t data, it’s definition

Most data teams still run a tidy assembly line: ingest sources into bronze, standardize into silver, curate into gold, and only then wire up a semantic model for BI. That sounds rigorous—but it puts the business contract (grain, conformed dimensions, measure logic, security scope, and SLOs) at the very end. By the time the organization finally argues about what “AUM” or a compliant “time‑weighted return” …

edudatasci.net

September 29, 2025 at 2:03 PM

Jason Miles

@edudatasci.net

I was surprised by how much got released at #FabConEurope for #MSPurview and #MSFabric. In this post, I take you through what I think are the biggest announcements from a #DataMesh perspective and what they'll bring to your organization.

FabCon Feature: Purview

On edudatasci.net, I keep data mesh grounded in four behaviors: domains own data; data as a product; a small self‑serve platform; and federated governance (policies expressed as code and applied consistently). I also use foundational vs derived data products as a practical way to think about scope and ownership, and I recommend publishing products in Purview’s Unified Catalog so ownership, access and SLOs are discoverable to the org, not just the team that built them.

edudatasci.net

September 26, 2025 at 2:06 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news