Data preparation tools help to condense the time taken on the analytics process. It involves finding, combining, cleaning and transforming raw data into curated datasets for self-service use cases. These commonly include data integration, analytics and BI, and data science processes. Data preparation tools are becoming increasingly important in enterprise settings and have evolved from use only in self-service environments. Providers offering capabilities in this market are divided into two camps, either offering standalone solutions or functionality that is integrated into larger data platforms.
The following providers have earned the top scores (4.0 or greater) in analyst house Gartner, Inc.’s Peer Insights software reviews. They have also been included as having a relevant product in Gartner’s recent Market Guide for Data Preparation Tools. While each company’s market share differs, these tools shape the foundation of this software market. Emerging providers can only hope to replicate the kind of success that these cornerstones have earned over a period of time. These providers stand out as rock-solid cornerstones that offer tools for a wide variety of use cases, making them the most trustworthy of the bunch.
Vendors are ranked in order of popularity among Gartner reference customers.
Alteryx is a self-service analytics software company that specializes in data preparation and data blending. Alteryx Analytics allows users to organize, clean, and analyze data in a repeatable workflow. Business analysts find this tool particularly useful for connecting to and cleansing data from data warehouses, cloud applications, spreadsheets and other sources. The platform features tools to run a variety of analytic jobs (predictive, statistical, spatial) inside a single interface.
Trifacta offers a suite of what its dubbed ‘data wrangling’ tools in three different iterations: Trifacta Wrangler, Wrangler Edge, and Wrangler Enterprise. Trifacta allows users to do data prep without having to manually write code or use mapping-based systems. The Predictive Transformation function enables the exploration of data content so users can define a recipe for how the data should be transformed. Data Wrangler also includes data discovery, structuring, cleaning, enriching, and validation capabilities.
Altair Knowledge Works (formerly Datawatch) offers a wide variety of data and analytic tools atop their flagship offering, Monarch. The platform is broken down further for users in enterprise settings and for those that wish to work on a web-based platform. Monarch Swarm offers self-service functionality and creates a social network of curated and raw data sets with controls and limitations defined for each individual. Monarch Server scales to thousands of users and provides complete automation on repeatable processes.
Paxata‘s Adaptive Information Platform offers data integration, quality, and governance capabilities for business analysts. It features flexible deployment options and self-service operation. The provider’s Self-Service Data Prep Application is built on a visual user interface that has familiar spreadsheet metaphors so users don’t have to learn an entirely new tool. The app also boasts Assisted Intelligence that provides algorithmic assistance to infer the meaning of data, and machine learning captures steps for future data work.
The Datameer data preparation platform is integrated with and delivered on Amazon Web Services. Datameer on AWS features the company’s security, governance, and operationalization features while strengthening its hybrid data architecture. The product makes it possible for companies to deploy a cloud-first analytics approach by letting them process data closer to where it lives. The platform comes with all of the analytics capabilities that has made Datameer so popular over the years.
Unifi was founded by data and enterprise infrastructure experts from Greenplum. Unifi’s data catalog provides user the ability to easily search and discover data regardless of where it lives and irrespective of its structure using natural language search. It also includes AI-powered data discovery out-of-box with auto-generated recommendations so users can view and explore datasets. Unifi also enables users to deconstruct TWBX files and see the fill lineage of a data source to see how datasets were transformed.
Cambridge Semantics is a data management and analytics provider that offers a semantic layer to connect enterprise data. The company’s flagship product, the Anzo Smart Data Lake allows users to link, analyze, and manage enterprise data in a variety of formats including structured, unstructured, internal, and external. Cambridge Semantics was recently recognized as a 2019 Trend-Setting Product by Database Trends and Applications.