Data Mesh Principles: Paradigms and Perspectives to Know
This is part of Solutions Review’s Premium Content Series, a collection of contributed columns written by industry experts in maturing software categories. In this submission, Octopai CEO Yael Ben Arie offers a view of data mesh principles through different paradigms and perspectives.
There are an impressive number of schools of thought when it comes to data management. Centralized data repositories. Data fabric. Data mesh. Data-as-an-asset. Data-as-a-service. Data-as-a-product.
And just as there are different strokes for different folks, there are different tools for different schools. While there is obviously overlap between these schools of thought, each requires its own set of critical data management tools and capabilities in order to deliver the hoped-for value.
The data mesh approach to data management has been on the community’s radar since Zhamak Dehghani’s landmark article in 2019. While the decentralized, domain-driven architecture of data mesh holds out the potential for scalable, self-serve data consumption without the bottlenecks common to monolithic big data architecture, ensuring that it delivers on that potential requires specific mission-critical data management capabilities.
Let’s take a look at what’s needed to successfully implement a data mesh architecture within an enterprise environment. We’ll touch on the data management perspectives without which a data mesh implementation will fall flat or limp along, and then delve into the practical capabilities and tools required for data mesh to succeed.
Data Mesh Principles
Paradigms and Perspectives as Prerequisites
How critical can an outlook be to practical success?
Well, when it came to the practical, mathematical business of the equations of relativity, the great mind of Albert Einstein stumbled on and tripped over his own outlook. Einstein’s equations seemed to indicate that the universe was expanding. But that didn’t jive with Einstein’s ingrained perspective of a static universe. So attached was Einstein to his outlook, that he changed his equations so they would mesh with this long-held paradigm.
It took 14 years before Einstein came to terms with and admitted his mistake. That was 14 years of effectively holding back his own practical success – just because of a perspective that he couldn’t let go of.
Implementing data mesh in an enterprise is decidedly more mundane than revealing the inner workings of the universe – but it’s no less susceptible to paradigm problems.
Among the perspectives that are critical to an effective data mesh strategy and implementation are:
Viewing Your Organization as Made up of Different Business Domains
Would you switch a bookkeeper into a sales role under the argument that “well, they both deal with revenue”? Thought not.
Even though business administration and sales both have something to do with revenue, the way each domain defines it and relates to it is different. It doesn’t make sense to have a unified company interdomain definition for “revenue,” “customer” or any other data asset any more than it does to assume that all employees can be used interchangeably in any company role.
The data mesh approach is based on the perspective that an organization is made up of different domains, defined according to their business function. Each domain should have its own unified intradomain model of data and its definitions. Interrelationships between the different domains and their models should be explicitly identified.
Data Operations Should be Intentionally Decentralized
One of the core data mesh principles is decentralized operations with centralized standards, otherwise known as a federated governance model.
Operations should be decentralized, relegated to each domain, and conducted by domain-specific, cross-functional teams. Instead of hyper-specialized data engineers who need to deal with requests from domain-specific data owners, producers, and consumers without actually understanding the context of the data, data engineers should be an integrated part of a domain-specific, cross-functional data team. This makes each domain an independent, effective and efficient unit when it comes to dealing with its data.
But no domain is an island. Business administration will need to use the data that sales generate; sales will need to use the data that marketing generates. In order to use a different domain’s data accurately and effectively, an organization must institute centralized standards.
Data is a Product that Domain-Specific Teams Must Make Usable and Available
When you view data as an asset, your responsibility is limited to making sure that the data is usable and available should anyone want to use it. When you view data as a product, your responsibilities go well beyond that, up to and including delivery of that product into the hands of its users.
An effective data mesh predicates each domain managing its data as a product. The ultimate goal of any domain-specific, cross-functional data management team is to produce a clean, reliable, helpful data product and actively get it into the hands of any data consumer within the enterprise that can use it.
The Paradigm Imperative
Trying to adopt a data mesh strategy while your organization still clings to centralization, unified definitions of everything, and a feeling that most of the responsibility for effective data use lies with the data consumer will leave you tripping over your own feet.
There’s much to admire about Albert Einstein, but clinging to old paradigms to your own detriment is not included.
Once you’ve got your paradigm ducks in a row, it’s time to move on to…
Practical Needs and Solutions
Once you’ve absorbed the perspectives of a data mesh outlook, there comes the matter of its practical implementation. What kind of data management capabilities do you need? What tools are necessary? The following are three areas of data mesh management and what you need on the ground to make them a reality.
As mentioned above, the data mesh approach runs on the principle of decentralized data operations conducted by cross-functional teams with deep domain expertise. These domain data teams are expected to have product management (as opposed to asset management) mentality when it comes to their data.
A product, unlike an asset, is assumed to be ready to consume. When you buy ready-to-eat food, you’re not supposed to need to run lab tests on it to determine that it’s safe for consumption! In the same way, the data integrity and usability of a data product are the responsibility of the producer, not the data consumer.
This overarching responsibility for the usability of the data you produce raises the need for both data lineage and data observability capabilities.
Data lineage lets you examine your data’s journey, from where it originated to its final target and everything (e.g. integration, aggregation, deduplication, etc.) that happened to it at every stage in its journey. A comprehensive, automated data lineage solution is important for ensuring data transparency, accuracy, and overall quality. In addition – because issues are bound to happen on any production line – the process transparency provided by data lineage is key to doing root cause analysis and fixing those issues with the speed necessary for effective production.
Planning on making a change to a data pipeline or process? Smart product managers don’t make a change to a production line and wait to see if… oh, something broke. Oops.
An ounce of impact analysis is worth a pound of picking up the pieces afterward. Data lineage not only lets you see backward along your data’s path, but also forward to the potential impact that a pipeline or process change would have. Don’t like what you see? You can change the plans now, or take measures to avoid unwanted consequences before implementing the change.
While data lineage opens a window into what has already happened or what could potentially happen within your data pipelines, data observability tools enable you to be on top of your pipelines in the moment. When you can see into what’s currently happening within your data pipelines, especially with automated data observability solutions that include milestones, alerts and the like, you can head issues off at the pass. You can avoid pipeline breakage and the need to backtrack and do root cause analysis.
Of course, until you know what you’re looking for, you’ll need the data lineage tool to uncover the root cause for which you can then set an alert. Data observability and data lineage work hand-in-hand with each other to make your pipelines more efficient and resilient.
Data lineage is also a critical support for a reality of every modern data environment: migrations. Data needs and tools change so rapidly that what fit your needs perfectly yesterday is no longer conducive to your growth today. And with the decentralized operations of data mesh, where each domain team has the flexibility to do what’s best for its data product, the barriers to migration are lowered. If your data product team foresees significant gains from moving your data from an onsite Oracle database to Snowflake, or from Snowflake to Amazon Redshift, you don’t have to get buy-in from your entire enterprise.
But even though decentralized operations make the decision to migrate simpler, migration planning and execution are never “simple.” Keeping track of tens of thousands of moving parts, ensuring that everything is packed up correctly at the source and unpacked correctly at the destination, while all the while continuing to provide your data product to enterprise-wide consumers (because operations can’t stop just because you decided to migrate)… there’s no way to avoid some headache. The headache can be significantly reduced, however, by using data lineage visualization to keep track of where everything is, visually compare sources and destinations to make sure the migration is proceeding as planned and locate needed data assets on the fly.
The key to successful data mesh is federated governance: decentralized operations with centralized standards.
A key component identified on any diagram of data mesh architecture is the data catalog. Each domain data product in a data mesh has its own data catalog based on metadata. The catalog specifies technical definitions, business definitions, user-generated information like ratings and reviews, and any other information helpful in using the data product effectively.
All of these local data catalogs feed into one enterprise-wide data catalog. This data catalog follows the same pattern on a broader scale, clearly laying out the definitions of and relationships between data products. An enterprise-level catalog entry should identify a data product, list its domain association and compare it or describe its relationship to similar data products from other domains.
Another critical component for federated governance is an immutable audit log for each data product. These audit logs provide one, unchangeable source of truth as to “what happened when” within any data product’s production. These audit logs are what feed the organization’s data lineage tool and enable traceability, reproducibility, and verifiability.
Data Mesh Infrastructure Management
Because of the interdependence between domains in a data mesh, it is important to have ways to identify and fix issues quickly. If there is an issue with sales releasing revenue datasets on time, for example, then business administration – who relies on that data product – is going to be held up.
As with intradomain pipeline issues, data lineage and data observability tools are key to facilitating the smooth resolution of interdomain infrastructure issues.
A strong but flexible data governance solution is also necessary for maintaining control in an environment of federated governance. Key capabilities include seamless coordination of access permissions and policies within and between business domains, the definition of roles and responsibilities on the domain level and on the mesh level, and enforcement of regulatory compliance standards.
Gatekeepers of Data Mesh Success
The ultimate success of a data mesh initiative is in your organization’s hands. Do those hands have the needed capabilities? Are they holding the right tools? And are they powered and guided by on-target perspectives and data management paradigms?
If so, onward! And may the mesh be with you.