Simplifying the ‘data product’
Through the continuing evolution of the modern data platform, there is an exciting turn towards new architectural patterns and terms. What started with ‘data warehouses’ soon became ‘data lakes’ and subsequently ‘lake houses’. Data sets with value have become data assets, which are then grouped into data products. Terms such as ‘data product’ are often very subjective, but really they don’t need to be. At GFT, we have implemented numerous architectural designs which focus on data products. This blog seeks to shed light on the subject, as simply as possible, for the benefit of all.
“If you can’t explain it simply, you don’t understand it well enough.”
The secret is in the name. Consider the giant retail marketplace, Amazon. Amazon is a platform which allows sellers to create and administer a virtual store and sell products to end consumers. Think of any product which you are able to purchase. When you go onto Amazon, and you search for an item, there is a list of products that is returned from the search. Your search selection might be based on the Amazon product reviews, pricing, ratings, comments, size or delivery time. Amazon is not the product. Amazon is the marketplace which allows you to browse the products on offer. Those products will have product stores and owners (sellers) behind the scenes, and those owners will manage their inventory, including stock, usage, distribution, delivery and administrative information about the product. This is metadata of sorts.
The products on Amazon are not restricted to one item of worth or asset. It is possible to purchase a variety of items; from toothbrushes, to gym equipment. A product is not defined as one type of object or asset, but an item which may be purchased, regardless of what it is.
Now let’s think about data
A data marketplace, which lets you browse multiple data assets you could use for any reason, is managed much like an Amazon product. In our terms, this is how we define the data product.
Data products include:
- Distributable datasets
- Reports and the pipelines that create them
- Machine learning models
Each of these data products is developed, managed, governed and distributed as an end-to-end solution. There could be multiple or single assets, which in turn make up the product. But the valuable, usable outcome of the process is the product.
Historically, enterprises built holistic solutions by combining different domains into one resource. Multiple disparate, unrelated items of value could be produced from one facility.
This sounds great, but that is like Amazon telling everyone wishing to sell something on the market, they have to use one holistic store, which is managed by one store owner, including all of the inventory. In this scenario, everyone’s products are dependent on great management of that single store. Whilst this may mean less admin for the individual sellers, there is high risk and far more dependency should that store owner not be great at maintaining his store. If that happened, then the seller would be more likely to go and create their own store elsewhere.
Back to data – in the marketplace, a prevailing preconception exists that enterprise data solutions are unsuccessful. This view is driven by the experience of failed data warehouse / data lake projects that have stretched over several years. The crux of the matter lies in the ever-evolving nature of each business unit’s requirements, which metamorphose at disparate rates. Whilst one faction of the organisation may clamour for swift implementation, another may exhibit a nonchalant stance. These perils, unfortunately, rear their heads when contemplating a centralised data repository.
A way to work around this challenge is by implementing the ‘data as a product’ architectural design pattern, which gives ownership back to the domain itself. The merits of such an architectural paradigm are substantial. It enables swifter domain deployments, a curbed risk profile, vigilant data quality management at the domain level, streamlined governance, and an unequivocal delineation of data ownership; all defined in the data mesh design pattern.
From data asset to ‘data-as-a-product’ to data product
Let’s not confuse data products and the term, ‘data-as-a-product’ as defined in the data mesh. They are similar, but not the same. If the data asset is the object of worth, and the product is the asset with supporting management capability, ‘data-as-a-product’ is the design method on how to get there, i.e. part of the data mesh.
Treating data as a product, changes how you design your architecture, which leads into domain level design patterns. The end-to-end managed asset is the data product and the architectural design to get there is due to treating ‘data-as-a-product’.
Is the concept of ‘data product’ a new evolution?
Data product is most certainly the next step in the modern data platform, but it is not necessarily the only design pattern to consider. Having multiple teams producing data products is simply overkill if the enterprise in question is not large. Sometimes centralised IT and governance is not only useful, but necessary.
There are various ways to design any data architecture, and although data product design is highly effective for faster delivery or value creation, it isn’t the only way. At GFT, we consider all design patterns to ensure the best solution is chosen based on specific business strategy, which may or may not include data fabrics, data meshes or any other nuanced evolution within the modern data platform.
Download David Tuppen’s recent thought leadership paper ‘The evolution of the modern data platform’ via our website.