Data preparation involves organizing, cleaning and unifying data into one place so that it can be used for analysis. Data prep tools are often deployed to help organizations work with data that is inconsistent and unstructured. These solutions are often used alongside data integration and processing software when combining data from multiple sources. Data prep is particularly useful when reporting on data that was captured manually. It’s equally vital during mergers and acquisitions, and when siloed data systems are being brought together.
In that spirit, we’ve turned our gaze to the future of data preparation. Whether its inclusion in a recent analyst report, the release of an innovative new tool, or a bump in venture funding, these are the providers that have earned watch list status for the year ahead.
1. ClearStory Data
ClearStory Data’s flagship product offers ways for organizations to discover, prepare, and blend data from structured and unstructured sources. ClearStory’s Data Inference tool automates the data preparation process for any data source by inferring semantics in business data, reading values, and automating transformations. The platform’s Intelligent Data Harmonization capabilities automatically produce visually blended insights by identifying relationships across dimensions in data.
Datawatch offers a wide variety of data and analytic tools atop their flagship offering, Datawatch Monarch. The platform is broken down further for users in enterprise settings and for those that wish to work on a web-based platform. Monarch Swarm offers self-service functionality and creates a social network of curated and raw data sets with controls and limitations defined for each individual. Monarch Server scales to thousands of users and provides complete automation on repeatable processes.
Lavastorm Server provides capabilities that run alongside data throughout the entire lifecycle. From data acquisition to data preparation and analytics, Lavastorm combines self-service with advanced features for both business users and IT. Lavastorm Server integrates data with operational ETL, and allows users to scale up their applications via extensible nodes. The tool also touts an immediate start-up with pre-configured components that hasten common data preparation work.
Paxata’s Adaptive Information Platform offers data integration, quality, and governance capabilities for business analysts. It features flexible deployment options and self-service operation. The provider’s Self-Service Data Prep Application is built on a visual user interface that has familiar spreadsheet metaphors so users don’t have to learn an entirely new tool. The app also boasts Assisted Intelligence provides algorithmic assistance to infer the meaning of data and machine learning captures steps for future data work.
Trifacta offers a suite of what its dubbed ‘data wrangling’ tools in three different iterations: Trifacta Wrangler, Wrangler Edge, and Wrangler Enterprise. Trifacta allows users to do data prep without having to manually write code or use mapping-based systems. The Predictive Transformation function enables the exploration of data content so users can define a recipe for how the data should be transformed. Data Wrangler also includes data discovery, structuring, cleaning, enriching, and validation capabilities.