Key Takeaways: Forrester Wave for Machine Learning Data Catalogs, Q4 2020

Key Takeaways: Forrester Wave for Machine Learning Data Catalogs, Q4 2020

Solutions Review editors provide key takeaways and analysis from the Forrester Wave for Machine Learning Data Catalogs, Q4 2020.

Technology advisory firm Forrester Research has released its latest Forrester Wave for Machine Learning Data Catalogs, Q4 2020. According to Forrester, machine learning data catalogs are “more than a metadata management tool and marketplace.” While standalone tools provide an enterprise hub across the business ecosystem and solution-and-platform-based catalog metadata repositories, machine learning enables the combination of a traditional data management business glossary with data stewardship, data preparation, and data marketplaces.

Forrester recommends that those currently evaluating machine learning data catalogs explore providers that power DataOps, data stewardship, and analytic process automation. In addition, solution-seekers should also consider providers that scale data intelligence and lineage across from metadata to the endpoint. Machine learning data catalogs are increasingly advancing to support broader varieties of data as well. The report adds “Connectors are becoming available for various content and media platforms. In 2018, these tools focused on structured data. Today, connectors and APIs are available to grab metadata from content and media.”

In a 39-criteria evaluation of machine learning data catalogs, Forrester researchers Michele Goetz, Gene Leganza, Boris Evelson, Jennifer Belissent, and Robert Perdoni identified the 10 providers whom they consider most significant in the software category: Alation, Alex Solutions, Collibra, data.world, erwin, Hitachi Vantara, IBM, Infogix, Informatica, and lo-Tahoe – then researched, analyzed, and scored them.

The Wave report details their findings and examines how each vendor meets (or falls short of) Forrester’s evaluation criteria and where vendors stand in relation to each other. The editors at Solutions Review read the report, available here, and these are our key takeaways.

Alation is in a league of its own

Alation is a complete repository for enterprise data, providing a single point of reference for business glossaries, data dictionaries, and Wiki articles. The product profiles data and monitors usage to ensure that users have accurate insight into data accuracy. The recent 2020.3 release brought a new naming convention, updated quarterly release schedule, and consumer-grade user interface with faceted search.

According to Forrester, Alation “exploits machine learning at every opportunity” to improve data management, governance, and analytic consumption. Every cataloging function is underpinned by intelligence while the platform learns from data patterns, queries, and search and interaction. Reference customers enjoy Alation’s Behavioral Analysis Engine which assists data sourcing and preparation by guiding SQL scripting.

Collibra and Alex Solutions are in a dead-heat among Leaders

Collibra documents an organization’s technical metadata and how it is used. It describes the structure of a piece of data, its relationship to other data, and its origin, format, and use. The solution serves as a searchable repository for users who need to understand how and where data is stored and how it can be used. A November update includes new capabilities that improve access to critical data and were designed to help users easily find, access, and understand data in more places.

Collibra reference customers speak highly about the vendor’s customer and product support teams, especially when it comes to product input and innovation. Users also highlight that Collibra understands what organizations are doing with data so they are more prepared to assist. One of the most popular parts of the platform comes when users combine the data intelligence features with included graph technologies.

Alex Solutions is a technology agnostic unified enterprise data catalog. It features a business glossary that enables users to define and maintain key business terms and link them to physical data assets, processes, and outputs. Policy-driven data quality combines data lineage with data profiling and machine learning-based intelligent tagging. The Alex machine learning data catalog is specifically designed for data stewards and engineers, and the popular lineage profiling capability provides data flow, cross-system flow, application flow, and mobile flow to see data in the digital ecosystem.

data.world has the strongest current offering

data.world offers a cloud-native enterprise data catalog that provides complete context so users can understand their data, regardless of where it resides. The product automatically builds a connected web of data and insights so users can explore relationships as well, and provides recommendations on related assets to improve analysis. data.world is unique due to its continuous release cycle.

The solution provider ranks the highest on Forrester’s vertical axis for strength of current offering. The researcher notes “It has moved quickly to step out of its niche data engineering position and serve data governance, analytic, and external marketplace objectives.” data.world’s user interface is intuitive and connected across key data roles as well to enable enterprise application of the platform.

Read the Forrester Wave for Machine Learning Data Catalogs, Q4 2020.

Timothy King
Follow Tim