Independent analyst for data and analytics Philip Russom PhD. offers commentary on the Gartner view of data fabric from the recent Gartner Data & Analytics Summit 2023.
I had the honor and pleasure of attending the Gartner Data & Analytics Summit, just held on March 19-22, 2023 in Orlando, Florida. It was an impressively large and well-organized event that covered most aspects of data and analytics (D&A), plus their best practices, tools, technologies, and team structures. However, in my opinion, the topic hit most often and most profoundly was the data fabric. Allow me to summarize the Gartner View of the Data Fabric, as presented at the Gartner D&A Summit 2023.
Data Fabric Gartner
Defining the Data Fabric
Data fabric has become the leading-edge paradigm for data management development, deployment, and automation. Without a fabric, organizations struggle with data availability and access, data standards, governance, data engineer productivity, and time to use for analytics and other data products.
Industry analysts at Garter define the data fabric as an architecture and set of best practices for unifying and governing multiple data management disciplines, including data integration, quality, active metadata, master data, pipelines, catalogs, orchestration, analytics, DataOps, and much more. “Unified” means that the diverse tools of a data fabric must interoperate deeply, in both development and production; unification and interoperability for the data fabric tool portfolio may be achieved via a common graphical user interface (GUI), application programming interfaces (APIs), data standards, user methods, shared data products and objects, and shared metadata and other semantics.
To achieve the scale, agility, and productivity required for a production data fabric, the tools used should ideally support automation, ranging from old-fashioned business rules to cutting-edge smart algorithms (perhaps based on machine learning) that recommend or automatically perform data engineering actions. Other data fabric capabilities either required or recommended by Gartner analysts include intelligent orchestration, composable architecture, data catalogs, and knowledge graph databases and analytics (to represent and analyze data objects collected via active metadata).
The data fabric capabilities mentioned above are numerous. Many are advanced (e.g., knowledge graphs, automation via machine learning) or in a rudimentary state of evolution (active metadata, DataOps). For these reasons, many data management professionals and other technical users find the Gartner definition of the data fabric to be quite challenging to understand and implement.
Hence, most data and analytics leaders don’t know where to begin when designing and deploying a data fabric. Likewise, it is not obvious how to extend existing data management solutions to evolve into a data fabric. Luckily, the Gartner D&A Summit 2023 included some sessions that provided clear explanations of the data fabric and its key component called active metadata, plus how to approach the implementation of these. I will now summarize those sessions.
The “Practical” Data Fabric
One of the most useful sessions I attended at the Gartner D&A Summit was “The Practical Data Fabric — How to Architect the Next Generation Data Management Design,” presented by Ehtisham Zaidi, VP Analyst at Gartner Inc. The presenter made the data fabric more understandable, and he stressed the benefits of the fabric.
For example, the data fabric offers something for everyone:
- Business users can quickly find, integrate, analyze, and share data, even in a self-service mode.
- Data management teams get greater productivity (via automated data access and interaction) and agility (they can close data requests sooner).
- The enterprise experiences faster time to insight from data and analytics investments, plus data literacy which improves the utilization of data.
As further proof of fabric benefits, the presentation quoted a recent Gartner prediction: “By 2025, active metadata-assisted automated functions in the data fabric will reduce human effort by half and quadruple data utilization efficiency.”
The greatest contribution of this presentation, however, is the procedure of nine steps that Ehtisham Zaidi shared with us. The procedure answers the tough questions many users ask: Where do we start? In what order should we proceed? What is the operating model for a data fabric? What advanced levels should we aspire toward?
Here is my summary of Ehtisham Zaidi’s nine steps for data fabric design and development:
- Collect Passive Metadata: Ideally, this encompasses all forms of metadata, including technical, operational, business, and social metadata, from a wide range of operational sources.
- Activate Metadata: In other words, automate metadata collection, monitor its state and use, and analyze metadata for insights into business processes and entities.
- Create Knowledge Graphs: The graphs can represent the objects discovered via metadata, as well as their relationships and semantics. A knowledge graph that describes multi-relationship data can enable analytics based on active metadata.
- Use Recommendations from the Data Fabric for Automation: For example, well-developed metadata management can enable valuable practices, such as self-service data access, data lineage, cataloging, and the analysis of metadata (instead of analyzing physical data in storage).
- Explore Self-Service Orchestration Opportunities: A data fabric does not require data management optimization, but it does enable it, with or without automation.
- Utilize DataOps to Streamline Data Integration Delivery: DataOps is an upcoming method for operationalizing the development and delivery of data products, which yields greater speed, developer productivity, and business alignment. DataOps provides these same benefits to the data fabric.
- Deliver Integrated Data as a Data Product: A data fabric can provision data for a wide range of data products, including tables, cubes, metrics, dimensions, semantics, and datasets.
- Adopt a Hub-and-Spoke Data and Analytics Operating Model: A data fabric can be the centralized and governed “single version of truth” which supports data products for many business domains and their satellite teams, perhaps in a data mesh architecture. This shows that data fabric (central and standardized) and data mesh (federated and domain specific) are complementary and can coexist.
- Focus on Foundations, then Progress to Advanced Levels: According to Gartner’s analyst Ehtisham Zaidi, there are three paths to a production data fabric:
The Foundation Path
Start with known data that can answer known questions, and use standard data integration tools and practices, plus data cataloging and the DataOps method.
The Advanced Path
Embrace unknown data and unknown questions. Use the tools and methods of the standard path, but add knowledge graphs (as a representation of data relationships) as an enabler for advanced analytics.
The Automation Path
Satisfy a need for automation, which is critical to data fabric speed, productivity, and depth of analytic insight. To the other two paths, add active metadata and a recommendation engine.
Data Fabric vs. Data Mesh
One of the more provocative sessions I attended at the Gartner D&A Summit was “Data Fabric or Data Mesh: Debate on Deciding Your Future Data Management Architecture,” presented by Ehtisham Zaidi, VP Analyst and Robert Thanaraj, Director Analyst at Gartner Inc.
Proponents of the data mesh tout it as a next-generation data management architecture based on domain-driven, distributed data management. The data fabric, on the other hand, provides an enterprise-wide infrastructure for centralized (but not necessarily consolidated) data management. Data mesh consultants regularly recommend that enterprises disassemble centralized competency centers for data management, to be replaced by multiple small teams at the department or business unit level. The data fabric, as defined by Gartner analysts, recommends that enterprises utilize their existing data management infrastructure and team structures, while evolving them to support data fabric technologies and methods from a central “version of the truth” for most enterprise data. Both data mesh and data fabric champion new practices, such as DataOps and data as a product.
The main advantage of this definition of data mesh is that the resulting data products align well with the business domain that the data and analytics team reports to, as compared to data products that come from a distant, central team and its data. There is some truth to this, but the downside is that the meshed teams are alienated from the standards, governance, and shared innovations of a central fabric and its team. And reinventing the wheel with multiple distributed teams is more expensive than the efficiency of a shared pool of data and analytics specialists. Plus, we all know that siloed departmental data has a variety of problems. Users must weigh the comparisons of mesh and fabric when contemplating the use of either.
The presenters – Ehtisham Zaidi and Robert Thanaraj – near the end of their session pointed out that the comparisons lose some relevance when we realize that data mesh and data fabric are very different and they target different technology stack levels. We can even say they have complementary strengths and weaknesses, such that deploying both can be desirable for some enterprises. For example, a compromise seen with some Gartner clients is that they maintain a central infrastructure and team for shared enterprise data (i.e., data fabric with enterprise governance), but deploy multiple autonomous teams for analytics focused on domain issues and goals (i.e., data mesh with federated governance).
The Active Metadata Helix
You may have noticed that metadata plays a prominent role in the data fabric, as defined by Gartner analysts. For example, the first four of Ehtisham Zaidi’s nine steps for data fabric design (listed above) are all about metadata, plus new and innovative methods for managing and using metadata, such as active metadata and knowledge graphs.
At the Gartner D&A Summit, Gartner analyst Mark Beyer drilled into a number of metadata-based innovations in his presentation “The Active Metadata Helix: The Benefits of Automating Data Management.”
According to Beyer: “Metadata is generated practically every time data is accessed in any tool, platform or application. This is a largely untapped resource that is real-time documentation of exactly how, when and why any person in the enterprise uses data from any and all assets available to them. ‘Active metadata’ is the conversion of these otherwise passive observations into ML-enabled, automated data management.”
The presenter introduced the idea of the ‘active metadata helix.’ For many business processes and entities, metadata records a syntax of data triples: subject, predicate, and object. Combining these data points from metadata forms a multi-layered helix, which is a series of data triples. Using any common data point, all triples can be traced. This is valuable to the business, because it reveals the details of data utilization, to support analytics about data lineage, compliant (or non-compliant) usage, user behaviors, levels of consumption (or lack of consumption), and so on. “Metadata is observable evidence of the data experience in an organization,” said Beyer.
Near the end of his session, Mark Beyer made a provocative recommendation: “Stop designing data so much. Start observing it, then build alerts that notify people about data and the events and entities it represents.” You need automation for that, which is where active metadata comes in. “Data management tools already ‘do’ metadata. Observe and listen to users by activating metadata.”
Note that Beyer’s vision for the future of metadata management and Gartner’s definition of data fabric both assume that fully automated active metadata will become a common data management tool function and technical user practice. Recent editions of the Gartner Hype Cycle for Data Management project the maturity of active metadata and the data fabric, but not for many years. Both have adoption rates of 5% or less of their addressable markets, according to anecdotal remarks made at the summit by Gartner analysts. So, don’t expect active metadata and the data fabric to be commonplace soon. But rest assured: they are coming.
- Gartner D&A Summit 2023: The Gartner View of the Data Fabric - March 30, 2023
- Gartner D&A Summit 2023: The Gartner View of the Data Lake & Lakehouse - March 30, 2023
- Jam Perspective: Cloud Data Architecture Principles & Practice - December 15, 2022