Data Product¶
A semantic data product that provides required information for a Use Case, selected based on semantic relevance and suitability to the use case's requirements
What Is a Data Product?¶
A Data Product is a semantic data asset that provides the required information for a Use Case. In later stages of the use case lifecycle, we can attach the use case to those semantic data products that already exist and can provide the required information.
Data Products are reusable components that encapsulate data, along with its meaning (semantics), making it available for use across multiple use cases.
The Data Economy¶
Think of your entire data landscape or ecosystem as a data economy — like any economy, it has a supply side and a demand side.
- Supply side — Data Products define what data is available, how it's packaged, and how it can be accessed. They represent the "supply" in the data economy.
- Demand side — Use Cases define who needs data, what they need it for, and why. They represent the "demand" in the data economy.
People often understand Data Products (the supply side), but struggle with "what's the demand?" Who uses these data products, and for what reasons? That's where Use Cases come in: a Use Case defines the demand side, specifying what data is needed and why.
This economic model helps organizations understand the flow of data, identify gaps between supply and demand, and ensure that data products are created to meet actual business needs.
Why Data Products Matter¶
Data Products enable:
- Reuse — Data can be packaged once and reused across multiple use cases
- Consistency — The same data product ensures consistent meaning and structure across use cases
- Efficiency — Avoids duplicating data preparation and transformation work
- Quality — Data products are designed, tested, and maintained as reusable assets
Semantic relevance matters
The choice of data products for a use case should be based on their semantic relevance and suitability to the use case's requirements, not just technical availability.
When Are Data Products Used?¶
Data Products are typically identified and attached to use cases in later stages of the lifecycle, once:
- The use case requirements are well understood
- The data needs are clearly defined
- Existing data products can be evaluated for suitability
This allows use cases to leverage existing, proven data products rather than creating new ones from scratch.
What Is a Data Product in the Use Case Tree Method?¶
A Data Product is a semantic data asset that provides required information for a Use Case. It encapsulates data along with its semantic meaning (defined through Ontologies), making it a reusable component in the Enterprise Knowledge Graph.
Semantic Data Products¶
Data Products in the Use Case Tree Method are semantic — they include:
- Data — The actual data values
- Semantics — The meaning of the data, defined through ontologies
- Structure — How the data is organized
- Metadata — Information about the data product itself
This semantic nature enables the EKG to understand what the data means, not just what it contains.
Selection Criteria¶
The choice of semantic data products to be used for a given use case should be made based on:
- Semantic relevance — Does the data product contain concepts and terms that match the use case's vocabulary?
- Suitability — Does the data product meet the use case's requirements?
- Quality — Is the data product well-maintained and reliable?
- Availability — Is the data product accessible and performant?
Semantic matching
The most important criterion is semantic relevance — the data product should align with the use case's concepts and terminology, enabling semantic integration.
Lifecycle Integration¶
In later stages of the use case lifecycle, we can attach the use case to those semantic data products that already exist and can provide the required information.
This approach:
- Promotes reuse — Leverages existing data products rather than creating new ones
- Ensures consistency — Multiple use cases can use the same data product, ensuring consistent data
- Reduces effort — Avoids duplicating data preparation work
- Maintains quality — Uses proven, tested data products
Relationship to Other Concepts¶
Data Products relate to other core concepts:
- Use Cases — Use cases consume data products to fulfill their requirements
- Ontologies — Data products use ontologies to define the semantic meaning of their data
- Concepts — Data products contain data about concepts that are relevant to use cases
- Stories — Stories may require specific data products to fulfill their needs
This integration ensures that data products are not isolated assets but part of a cohesive, semantic model of the enterprise.
Reuse and Composition¶
Data Products are designed for reuse across multiple use cases. This enables:
- Composition — Use cases can combine multiple data products to meet their requirements
- Consistency — The same data product ensures consistent meaning across use cases
- Efficiency — Data preparation is done once and reused many times
- Maintainability — Changes to a data product benefit all use cases that use it
This reuse pattern aligns with the composable business approach, where data products become reusable components in the Enterprise Knowledge Graph.
What Is DPROD?¶
The Data Product Ontology (DPROD) is a specification developed by the Object Management Group (OMG) Enterprise Knowledge Graph Forum (EKGF) that provides a standardized way to describe Data Products using W3C Linked Data standards. DPROD is available at https://ekgf.github.io/dprod/.
DPROD builds on the W3C Data Catalog Vocabulary (DCAT) to enable publishers to describe Data Products and data services in a decentralized way. By using a standard model and ontology, DPROD facilitates the consumption and aggregation of metadata from multiple Data Marketplaces, increasing discoverability and enabling federated search.
Why DPROD?¶
As organizations increasingly recognize the value of data as an asset and adopt decentralized data architectures (such as Data Mesh), the need for standardized methods to describe and manage data products consistently across platforms has become critical.
Without such a standard, organizations face: - Inconsistent metadata across diverse data products - Limited discoverability - Interoperability issues that hinder data integration - Difficulty scaling as data ecosystems grow - Increased vendor lock-in
DPROD offers a solution by providing a clear schema for describing data products, ensuring they are discoverable, interoperable, and treated with the same level of accountability as traditional products.
DPROD Principles¶
DPROD follows two basic principles:
- Decentralize Data Ownership — To make data integration more efficient, tasks should be shared among multiple teams. DCAT helps by offering a standard way to publish datasets in a decentralized manner.
- Harmonize Data Schemas — Using shared schemas helps unify different data formats. DPROD provides a common set of rules for defining a Data Product, which users can extend as needed.
DPROD Model¶
The DPROD specification extends DCAT to connect Data Services to Data Products using input and output ports. These ports are used to publish and consume data from a Data Product.
The model consists of:
- Data Product (
dprod:DataProduct) — A rational, managed, and governed collection of data, with purpose, value, and ownership, meeting consumer needs over a planned lifecycle - Port (
dcat:DataService) — A digital interface that provides access to a Dataset (input or output port) - Distribution (
dcat:Distribution) — A specific representation of a dataset (CSV, JSON, Parquet, etc.) which can conform to a physical model - Dataset (
dcat:Dataset) — A collection of related data that can conform to a logical model
Data Products can have: - Input ports — Services that collect source data and make it available for transformation - Output ports — Services that share generated data in a way that can be understood and trusted - Lifecycle status — Development status (e.g., Ideation, Design, Build, Deploy, Consume) - Owner — The agent accountable for the data product - Domain — The business or information area supported - Purpose — Objectives and intended usage
Relationship to Use Case Tree Method¶
DPROD provides a standardized way to describe Data Products that aligns with the Use Case Tree Method's approach:
- Semantic description — DPROD uses ontologies to describe the semantic meaning of data products
- Reuse and composition — DPROD enables data products to be discovered and composed
- Lifecycle management — DPROD tracks the lifecycle status of data products
- Decentralized architecture — DPROD supports decentralized data ownership and management
When implementing Data Products in the Use Case Tree Method, DPROD can be used to provide standardized metadata that enables discovery, integration, and governance across the enterprise.
Key Features¶
DPROD enables:
- Unambiguous semantics — Provides clear, sharable semantics to answer "What is a data product?"
- Simplicity and expressiveness — Simple enough for anyone to use, but expressive enough to power large data marketplaces
- Reuse of existing infrastructure — Allows organizations to reuse their existing data catalogues and dataset infrastructure
- Harmonization — Shares common semantics across different Data Products, promoting consistency
For more details, examples, and the complete specification, see the DPROD documentation.
Data Products as Use Cases
A Data Product is basically just another Use Case — with its own stereotype. You can stereotype a use case as a "data-product" or as an "upstream data-source" or "downstream data-sink," or whatever categorization you prefer. But fundamentally, a data product can be seen as just another use case.
This unified model means that: - Data Products follow the same lifecycle as Use Cases - They can be organized in the Use Case Tree - They have the same components: Personas, Stories, Concepts, Outcomes - They enable the same composability and reuse patterns
The stereotype simply indicates the primary purpose or role of that use case in the data economy.