Data Mesh Governance / Policies / Definitions
Data Product
Category: Definitions
Status: Proposed
Context
What do we mean when we say a data product?
Decision
In our context, a data product is a self-contained deployment unit that contains all components to process, interpret and share analytical data.
A data product includes:
- Data transformation code
- Output port definitions
- Input port definitions
- Discovery and observability APIs
- Documentation
- SLOs
- Access control
- Platform dependencies (compute and storage resources)
Conceptual Distinction
When we say a data product, we explicitly do not mean these related concepts:
- Data as a Product: The data mesh principle to apply product thinking on analytical data
- Data is our Product: We sell data to our customers
- Data Contract: The agreement under which data is made accessible
Consequences
- A data product is a software component that needs to be developed and maintained by a dedicated team
- No organizational dependencies to other teams to implement it
- The definition of a data product with all its components can be specified in one Git repository
- Data products can be interconnected and may rely on other data products
- This definition is in line with Zhamak Dehghani’s description of a data product as an architecture quantum
Automation
- A data product can be specified as a Terraform module that deploys the data product on the data platform through a CI/CD pipeline
- The data platform team is responsible to provide the Terraform module