
Make data in your enterprise do more for less
Read time: 4 minutes 20 seconds
Data Fabric is the most promising and hot trend of Data and Analytics world!
Data Mesh! the new paradigm!
Lakehouse brings the best of Data Lake and Data warehouse!
These are some of the trends from Gartner’s Data Management Hype Cycle for 2021. But for the longest of times, data enthusiasts knew only 2 streams of thoughts, The 3NF Enterprise Data warehouse (EDW) ideology propagated by Bill Inmon and Datamart-based enterprise bus architecture popularized by Ralph kimbal. The 2 Guru’s of Data world.
Once the enterprise needs started to outgrow the capabilities offered by both the EDW and Datamarts, industry started searching for new approaches. Enter “Big Data”; “Big Data” was the hottest thing and everyone jumped on that bandwagon (I would say without too much thinking). It was super easy to onboard data, and it was faster and cheaper (It was not cheaper from total cost of ownership TCO standpoint, but that’s how it was initially marketed). Enterprises had gotten the ability to ingest vast amount of data, and an enormous capability to process terabytes of data, first with the help of map / reduce and then with Spark. But these Big Data lakes in some cases became the White Elephant projects. Too long to execute, too cumbersome to consume from and difficult to govern. Benefits offered by Data Lake served some of the enterprise users, but not all.
I wanted to provide this perspective before talking about Data Mesh. It’s important to understand why Data Mesh architecture is relevant and timely for today’s data and analytics world. Data mesh, a term coined by Zhamak Dehghani is a network of distributed data products linked together, which follow FAIR principles (findable, accessible, interoperable, and reusable). It operates on 4 principles:
1. Domain-oriented decentralized data ownership and architecture
2. Data as a product (DaaP)
3 .Self-serve data infrastructure as a platform,
4 .Federated computational governance.
In this blog we will further explore Data as a Product. Data product is a node in the mesh that include Code, Data & Metadata and Infrastructure. Data as a product principle in data mesh addresses the high cost of discovering, understanding, trusting and using quality data. Given the multiple domain teams involvement (decentralization) in data mesh, data as a product principle is very important to address data quality and dark data (information produced by organization, but generally not used for analytical or other needs).
Data as a product” is the result of applying product thinking into datasets, making sure they have a series of capabilities including discoverability, security, understandability, trustworthiness, etc. DaaP has been in practice for many decades and the most commonly followed by companies in some shape or form. Data team collects, cleans, integrates and makes data available to consuming teams to create a service or end product. For example making customer data available for marketing team so they can run campaigns for specific customers who are most likely to buy the product. Same data can be used by risk team to identify potential fraud or by operations team to provide services the customers. Bloomberg financial data is one of the common and successful example of data as a product. And within each enterprise there are numerous examples that can be found.
DaaP principle in Data Mesh calls out specific capabilities DaaP should demonstrate:
Discoverable
In order for data as a product to be discoverable, a search engine is needed and users must be able to register datasets in this engine and request access to them (this will increase security, another capability explained below).
The first iteration for this capability could be just a list of datasets in internal intranet and you can iterate and build incrementally from that.
Addressable
Having addressable datasets makes your teams more productive. On one side, Data Analysts and Data Scientists are autonomous in finding and using the data they need. On the other side, Data Engineers have far less interruptions from people asking where they can find data about X.
Trustworthy
Checking data quality regularly and automatically is a must to fulfil the trustworthy characteristic of data as a product. And owners of the datasets need to react accordingly to the results of these checks.
Quality checks must be done at pipeline input and output and it doesn’t hurt to provide contextual data quality information to consumers of the data; like for example in Tableau dashboards.
Understandable
Data users (analyst, data scientist, consuming applications) need to easily understand what this data is. It needs to be explained with sample data sets.
Interoperable
Datasets need to contain metadata that make them understandable and follow the same naming conventions (which will make the datasets interoperable)
Secure
Ability to access the data security. Who can access it, who can approve it should be laid out upfront.
If you really start thinking Data as a Product, you can make the Data in your enterprise Do More for Less! Let’s take example of customer data. When Customer Data is a product, it will have following capabilities to meet DaaP principle:
-
Have definitions for all its attributes (Name, email, address, preferences)
-
Where are the attributes sourced from, who owns them (which is authorized source for specific attributes and who is responsible for creating, modifying and deleting)
-
Who certified that the data is correct (data steward)
-
What is mechanism to access. Can you access it real-time and / or in batch mode.
-
Who can access, who can approve the access.
-
Can it be combined with other enterprise data, how?
-
How is this product priced? For internal users, external users?
Now anyone in the enterprise or outside (if applicable) can use this product. Every time someone needs Customer Data, they don’t need to go to data team (or IT) to create (or get) it. This is a big step forward from just treating data as asset.
Customer Data now is a product which is self-contained, has product specification, pricing structure, life cycle management and works for benefit and delight of its customers.
A Data Mesh architecture will help you build multiple Data Products and link them together. It will improve agility for enterprise to respond to fast changing market, at the same time, improve quality, make all data in enterprise usable, increase reusability, reduce data duplication and increase trust. Do more for Less! If Data is the new Oil, it needs to be first treated as Product
Zensar’s Data Engineering and Analytics coupled with experience led engineering is well positioned to bring velocity for your enterprise to ideate, design, implement and launch Data Products.