IT news and analysis outlet CRN recently released its 2020 (and eighth annual) Big Data 100, a ranking of prominent big data technology vendors that solution providers should be aware of. The list is made up of established and emerging big data tools vendors. The list is broken down into five distinct product categories that include business analytics, database systems, data management and data integration software, big data platforms, and data science and machine learning tools.
CRN pre-published a list of The Coolest Data Management and Integration Tool Companies included in the overall list via an interactive slideshow. Though the Big Data 100 is aimed at highlighting software vendors for the purposes of solution provider partnering, Solutions Review is most interesting in highlighting the vendors from that offer unique products and platforms for enterprise organizations. As such, we’ve read through CRN’s complete rankings, available here, to analyze the trending data integration tools companies we think matter most. For an even deeper breakdown of data integration software, tools, vendors and platforms, consult our popular Buyer’s Guide.
Actian offers data integration software in on-prem and cloud editions, DataConnect and DataCloud. DataConnect is a hybrid solution that enables users to design, deploy, and manage integrations without limits on data types or volumes. Actian Cloud is an elastic platform for deploying and managing hybrid, on-prem, or cloud-to-cloud integrations in an on-demand services platform that is powered by Amazon Web Services.
Alluxio enables data orchestration for compute in any cloud. The product unified data silos on-prem across any cloud to provide data locality, accessibility, and elasticity. Alluxio is scalable to over a billion files in a single cluster, and its distributed architecture is built on three core components including Alluxio Master (manages file and object metadata), Alluxio Worker (manages node local space), and Alluxio Client (AI/ML application interface). The product also includes support for hyperscale workloads, flexible APIs, security and monitoring and management.
The Denodo Platform offers data virtualization for joining multistructured data sources from database management systems, documents, and a wide variety of other big data, cloud, and enterprise sources. Connectivity support includes relational databases, legacy data, flat files, CML, packed applications, and emerging data types including Hadoop. Denodo is the only data virtualization solution to be provisioned as a virtual image on Amazon AWS Marketplace.
Diyotta is a unified data integration platform that integrates with modern data lake and data warehousing environments. The drag-and-drop user interface and native processing capabilities make this product one to consider. Diyotta enables shorter development times, faster data movement, and reusability across the enterprise to make future development simple. Diyotta touts the industry’s first data integration software to leverage modern data processing platforms like Hadoop, Snowflake, Google BigQuery, and Amazon Redshift.
Fivetran is an automated data integration platform that delivers ready-to-use connectors, transformations and analytics templates that adapt as schemas and APIs change. The product can sync data from cloud applications, databases, and event logs. Integrations are built for analysts who need data centralized but don’t want to spend time maintaining their own pipelines or ETL systems. Fivetran is easy to deploy, scalable, and offers some of the best security features of any provider in the space.
Informatica’s data integration tools portfolio includes both on-prem and cloud deployments for a number of enterprise use cases. The vendor combines advanced hybrid integration and governance functionality with self-service business access for various analytic functions. Augmented integration is possible via Informatica’s CLAIRE Engine, a metadata-driven AI engine that applies machine learning. Informatica touts strong interoperability between its growing list of data management software products.
Infoworks offers an automated data operations and orchestration platform called DataFoundry. The product provides a no-code environment for configuring the ingestion of data (batch, streaming, change data capture) from a variety of data sources. Infoworks uses native connectors when possible to provide ingestion feasible and ingest source data while automatically preserving data precision. It automatically crawls data sources and relational databases, learns the metadata and infers data relationships for ingested data from external data sources.
Matillion offers data integration software for cloud data warehouses, and was designed for Amazon Redshift, Snowflake, and Google BigQuery. The product works by allowing users to consolidate large data sets and quickly perform data transformations. It features expert technical support as well, all by Matillion solution architects and comes free of charge. Full support is offered throughout the customer lifecycle, including trial and complex use case development. Matillion includes more than 70 pre-built connectors.
Striim offers a real-time data integration solution that enables continuous query processing and streaming analytics. Striim integrates data from a wide variety of sources, including transaction/change data, events, log files, application and IoT sensor, and real-time correlation across multiple streams. The platform features pre-built data pipelines, out-of-the-box wizards for configuration and coding, and a drag-and-drop dashboard builder.
StreamSets offers a DataOps platform that features smart data pipelines with built-in data drift detection and handling, as well as a hybrid architecture. The product also includes automation and collaboration capabilities across the design-deploy-operate lifecycle. StreamSets monitors data in-flight to detect changes and predicts downstream issues to ensure continuous delivery without errors or data loss. The tool’s live data map, data performance SLAs and data protection functionality are major value-adds.
Syncsort offers its data integration capabilities via two product families, Syncsort Connect and Syncsort Ironstream. The company’s flagship application data integration tools are the Syncsort Connect product family. Syncsort allows users to hasten database queries and applications by putting relational databases to best use. The Intelligent Execution feature dynamically selects the most efficient algorithms based on the data structures and system attributes it encounters at run-time.
Talend offers an expansive portfolio of data integration and data management tools. The company’s flagship tool, Open Studio for Data Integration, is available via a free open-source license. Talend Integration Cloud is offered in three separate editions (SaaS, hybrid, elastic), and provides broad connectivity, built-in data quality, and native code generation to support big data technologies. Big data components and connectors include Hadoop, NoSQL, MapReduce, Spark, machine leaning and IoT.
Tamr offers a machine learning-based data integration product called Unify. The solution allows organizations to connect to any tabular data and publish it anywhere. Users can map schemas with machine learning suggestions and normalize data formats using Spark and SQL. Tamr’s Master Records feature provides a complete view of all entities via simple yes and no questions as well. Tamr has also begun offering an issue tracker specifically designed for data called Steward (beta).
Trifacta offers a suite of what its dubbed ‘data wrangling’ tools in three different iterations: Trifacta Wrangler, Wrangler Edge, and Wrangler Enterprise. Trifacta allows users to do data prep without having to manually write code or use mapping-based systems. The Predictive Transformation function enables the exploration of data content so users can define a recipe for how the data should be transformed. Data Wrangler also includes data discovery, structuring, cleaning, enriching, and validation capabilities.