Data catalogs are becoming increasingly popular for a number of common data management initiatives, including improving data quality and regulatory compliance. The technology is also being used to ensure better self-service BI governance. Data catalogs act as a kind of end-user index for searching data sources and definitions that have been compiled through data discovery, with automated machine learning that tags or scans data.
Cue the process of seeking out, evaluating, purchasing, and deploying a machine learning data catalog solution. There’s no such thing as a one-size-fits-all approach when it comes to big data. Solutions come in a variety of flavors—ranging from metadata management to data governance and compliance. Choosing the right vendor and solution is a complicated process—one that requires in-depth research and often comes down to more than just the solution and its technical capabilities.
In that spirit, we’ve turned our gaze to the future of data catalog software. Whether it’s the release of an innovative new product, a bump in venture capital, or inclusion in one of the top analyst reports, these are the providers that have earned watch list status from Solutions Review for the year ahead. The vendors are listed in alphabetical order and have specific areas of expertise.
Trademarked ‘The Smart Data Company’, Cambridge Semantics is a data management and analytics provider that offers a semantic layer to connect enterprise data. The company’s flagship product, the Anzo Smart Data Lake, allows users to link, analyze, and manage enterprise data in a variety of formats including structured, unstructured, internal, and external. Cambridge Semantics was recently recognized as a 2019 Trend-Setting Product by DBTA.
Waterline Data offers a data cataloging solution that uses machine learning to discover and manage enterprise data. The tool allows organizations to automatically and incrementally “fingerprint” data and infer its lineage by analyzing data values for relational, cloud, and Hadoop data. Fingerprinting works on the concept that a column of data has a distinctive signature that incorporates its technical metadata, content, format and context. The company raised $14.5 million in Series C funding back in October.
Reltio allows organizations to manage data by utilizing continuous data organization and recommended actions. The vendor’s Self-Learning Data Platform organizes data from a wide variety of sources and formats to create a unified data set with personalized views for users across business departments. Reltio released a Data Quality Confidence Indicator for business users in November, and was named to the Deloitte Technology Fast 500 shortly after.
Unifi’s data catalog provides user the ability to easily search and discover data regardless of where it lives and irrespective of its structure using natural language search. It also includes AI-powered data discovery out-of-box with auto-generated recommendations so users can view and explore datasets. Unifi also enables users to deconstruct TWBX files and see the fill lineage of a data source to see how datasets were transformed.