Module 19 of 26 · Applied

Data as a product

15 min read 3 outcomes Interactive + drag challenge 4 standards cited

By the end of this module you will be able to:

  • Explain Dehghani's data mesh principles
  • Describe data contracts and their purpose
  • Evaluate when data-as-product thinking applies
Product team collaborating on architecture design

Real-world architecture · 2019

Zhamak Dehghani asked: what if data teams operated like product teams?

In 2019, Zhamak Dehghani published "How to Move Beyond a Monolithic Data Lake to a Distributed Data Mesh" while working at ThoughtWorks. The core argument: centralised data platforms create bottlenecks because one team cannot understand every domain's data.

Data mesh proposes that domain teams own their data as products, with the same rigour applied to software products: documented APIs, quality guarantees, versioning, and consumer feedback loops. The previous module covered how to structure data. This module covers how to treat it as a product.

Centralised data teams become bottlenecks at scale. What if the teams that produce data also owned it as a product, with SLAs, documentation, and quality guarantees?

Treating data as a product means applying product management principles: who are the consumers? What do they need? How do we measure satisfaction? This shift changes data teams from cost centres (infrastructure providers) to value creators (product owners delivering reliable, documented, discoverable data).

With the learning outcomes established, this module begins by examining the data mesh principles in depth.

19.1 The four principles of data mesh

  1. Domain ownership: the team that generates data owns it. The sales team owns sales data. The logistics team owns delivery data. Domain experts understand the data better than a centralised platform team ever could.
  2. Data as a product: domain teams treat their data outputs as products with consumers. Products have documentation, quality SLAs, versioning, and feedback channels.
  3. Self-serve data platform: a shared infrastructure layer (compute, storage, cataloguing, access control) that domain teams use without needing platform engineering skills.
  4. Federated computational governance: global standards (naming conventions, quality thresholds, security policies) enforced computationally rather than through manual review.

A data product is a dataset, API, or model that is treated with the same rigour as a software product: it has an owner, documentation, quality guarantees, and a defined consumer base.

Zhamak Dehghani, 'Data Mesh: Delivering Data-Driven Value at Scale' (2022) - Chapter 4

Dehghani's definition reframes data from a byproduct of business operations to a first-class product. The shift is cultural as much as architectural: domain teams must accept accountability for data quality, not just for application functionality.

Common misconception

Data mesh means every team builds its own data infrastructure.

Data mesh separates domain ownership (who is responsible for the data) from platform infrastructure (the shared technology layer). The self-serve platform principle explicitly provides shared compute, storage, and tooling. Domain teams own the content and quality; the platform team owns the infrastructure. Duplicating infrastructure across teams is the opposite of data mesh.

With an understanding of data mesh ownership, data-as-product, shared platform, and federated governance, the discussion can now turn to data contracts in practice, which builds directly on these foundations.

Data products dashboard tracking quality, freshness, consumer count, and SLA compliance monitored by the product owner
Data products have dashboards tracking quality, freshness, consumer count, and SLA compliance. The product owner monitors these the same way a software team monitors uptime.

19.2 Data contracts in practice

A data contract is the interface between a data product and its consumers. Module 13 introduced contracts conceptually. In the data-as-product context, the contract is the product specification: it defines what consumers receive and what guarantees the producer makes.

A contract typically includes: schema definition (column names, types, constraints), freshness SLA (data available within N minutes), quality thresholds (completeness above 99%, no nulls in key fields), semantic definitions (what each field means), and an owner contact for escalation.

Data contracts shift the responsibility for data quality from consumers, who discover problems after the fact, to producers, who must meet defined standards before publishing.

Andrew Jones, 'Data Mesh in Practice' (2023) - Chapter 6

This shift is the core operational change: producers cannot push schema changes, quality degradation, or undocumented fields onto consumers. The contract enforces accountability at the source.

Common misconception

Data mesh only works for large organisations with hundreds of engineers.

The principles scale down. A team of 20 can apply domain ownership (the marketing analyst owns marketing data), data-as-product thinking (documented, quality-checked datasets), and shared infrastructure (a single cloud warehouse with access controls). You do not need a platform team of 50 to benefit from treating data as a product.

Data contracts documenting producer-consumer agreements enforced by CI/CD checks validating schema and quality before publishing
Data contracts document the agreement between producers and consumers. They are enforced computationally: CI/CD checks validate schema and quality before data is published.
Loading interactive component...
19.3 Check your understanding

A company's centralised data platform team is a bottleneck: domain teams wait weeks for data to be ingested and transformed. Which data mesh principle most directly addresses this?

A data product has a freshness SLA of 60 minutes. Data has not been updated for 3 hours. What should happen?

A startup with 15 employees wants to adopt data mesh. A consultant says they need a dedicated platform team, domain data teams, and a governance council. Is this advice correct?

Loading interactive component...

Key takeaways

  • Data mesh is built on domain ownership, data as a product, self-serve data platform, and federated computational governance. Together they decentralise data responsibility while maintaining standards.
  • Treating data as a product means documentation, quality SLAs, versioning, and consumer feedback loops. The shift is cultural: domain teams accept accountability for data quality, not just application functionality.
  • Data contracts are the interface between data products and consumers. They specify schema, freshness, quality thresholds, and ownership. Violations block publishing.
  • Data mesh principles scale down. A small team can apply domain ownership and product thinking without dedicated platform teams or governance councils.

Standards and sources cited in this module

  1. Dehghani, Z. (2022). Data Mesh: Delivering Data-Driven Value at Scale

    Chapters 3-6

    Foundational text defining all four data mesh principles. Used throughout this module.

  2. Jones, A. (2023). Data Mesh in Practice

    Chapter 6, Data Contracts

    Practical implementation guidance for data contracts in mesh architectures.

  3. Dehghani, Z. (2019). 'How to Move Beyond a Monolithic Data Lake to a Distributed Data Mesh', ThoughtWorks

    Full article

    Original article introducing data mesh. The opening case study references this publication.

  4. DAMA-DMBOK2 (2017)

    Chapter 3, Data Governance

    Governance principles that data mesh extends with computational enforcement.

Module 19 of 26 · Applied Data