IT news and analysis outlet CRN recently released its 2020 (and eighth annual) Big Data 100, a ranking of prominent big data technology vendors that solution providers should be aware of. The list is made up of established and emerging big data tools vendors. The list is broken down into five distinct product categories that include business analytics, database systems, data management and data integration software, big data platforms, and data science and machine learning tools.
CRN pre-published a list of The Coolest Data Science and Machine Learning Tool Companies included in the overall list via an interactive slideshow. Though the Big Data 100 is aimed at highlighting software vendors for the purposes of solution provider partnering, Solutions Review is most interested in highlighting the vendors from that offer unique products and platforms for enterprise organizations. As such, we’ve read through CRN’s complete rankings, available here, to analyze the trending data science and machine learning companies we think matter most. For an even deeper breakdown of advanced data analytics software, tools, vendors and platforms, consult our popular Buyer’s Guide.
Anaconda is an open source Python and R data science platform. The tool enables you to perform data science and machine learning on Linux, Windows, and Mac OS. The product allows users to download more than 1,500 Python and R data science packages, manage libraries, dependencies, and environments, and analyze data with Dask, NumPy, pandas, and Numba. You can then visualize results generated in Anaconda with Matplotlib, Bokeh, Datashader, and Holoviews.
Big Squid offers an automated machine learning platform called Kraken. The product features two-way connections to an organization’s existing data platforms. Kraken’s machine learning models help to uncover data quality issues before prediction. Kraken insights generate all hidden correlations in your data, including the weight that each metric has on the end outcome. An ML-driven scenario planning feature lets users see outcomes from taking different actions and plan for likely unknowns.
Dataiku offers an advanced analytics solution that allows organizations to create their own data tools. The company’s flagship product features a team-based user interface for both data analysts and data scientists. Dataiku’s unified framework for development and deployment provides immediate access to all the features needed to design data tools from scratch. Users can then apply machine learning and data science techniques to build and deploy predictive data flows.
DataRobot offers an enterprise AI platform that automates the end-to-end process for building, deploying, and maintaining AI. The product is powered by open source algorithms and can be leveraged on-prem, in the cloud or as a fully-managed AI services. DataRobot includes three independent but fully integrated tools (Automated Machine Learning, Automated Time Series, MLOps), and each can be deployed in multiple ways to match business needs and IT requirements.
Domino Data Lab
Domino Data Lab offers an enterprise data science platform that allows data scientists to build and run predictive models. The product helps organizations with the development and delivery of these models via infrastructure automation and collaboration. Domino provides users access to a Data Science Workbench that provides open source and commercial tools for batch experiments, as well as Model Delivery so they can publish APIs and web apps or schedule reports.
dotData offers an enterprise data science automation platform. The product can be piloted by both advanced and citizen data scientists. dotData can use flat files as well as relational data sets, and automatically discovers the table relationships and prepares data for feature engineering. Users can also generate automatic features and train models automatically via python ML algorithms. The API-based platform means you can validate model accuracy and retain models on the fly as well.
H2O.ai offers a range of AI and data science platforms. Its H2O platform is a fully open source, distributed in-memory machine learning platform with linear scalability. H2O supports widely used statistical and machine learning algorithms including gradient boosted machines, generalized linear models, deep learning and more. H2O has also developed AutoML functionality that automatically runs through all the algorithms to produce a leaderboard of the best models.
Iguazio offers a data science platform that automates workflows. The product can ingest multi-model data like event-driven streaming, time series, NoSQL, SQL and files in real-time. Iguazio lets users explore and manipulate data online and offline, and the platform is powered by a real-time data layer that uses a variety of data science and analytics frameworks which come pre-installed. Users can train models continuously in a production-like environment that dynamically scales GPUs and managed machine learning frameworks as well.
KNIME Analytics is an open source platform for creating data science. It enables the creation of visual workflows via a drag-and-drop-style graphical interface that requires no coding. Users can choose from more than 2000 nodes to build workflows, model each step of analysis, control the flow of data, and ensure work is current. KNIME can blend data from any source and shape data to derive statistics, clean data, and extract and select features. The product leverages AI and machine learning, and can visualize data with classic and advanced charts.
RapidMiner offers a data science platform that enables people of all skill levels across the enterprise to build and operate AI solutions. The product covers the full lifecycle of the AI production process, from data exploration and data preparation to model building, model deployment, and model operations. RapidMiner provides the depth that data scientists need, but simplifies AI for everyone else via a visual user interface that streamlines the process of building and understanding complex models.