Hitachi Vantara recently announced new machine learning orchestration capabilities targeted toward data scientists, allowing them to monitor, test, retrain, and redeploy supervised models in production. The announcement was made at the Strata + Hadoop World conference in San Jose. Hitachi Vantara Labs, an applied lab staffed with top industry experts, is responsible for the innovation, collectively known as machine learning model management.
Pentaho users can put these new tools to use in a data pipeline to make it easier to update models in response to continual change. Hitachi Vantara Labs is making the machine learning model management available as a plug-in via the Pentaho Marketplace. Machine learning orchestration steps evaluate models and improve their accuracy using real production data before going live. Data operations teams can also generalize models against production test data using a choice of cross-validation and holdout evaluation techniques.
To avoid accuracy degradation, a new range of evaluation statistics help to identify degraded models. Visualizations and reports also make it easier to analyze model performance to discover errors. When updates need to be made, new “challenger” models can be A/B-tested against the current “champion” models.
The machine learning add-ons represent a step toward bulkier collaboration from Hitachi. Users are provided data lineage of model steps and visibility of data sources and features that feed the model. This allows for easy sharing of data pipelines. In a press statment, Hitachi’s John Magee explained: “Hitachi Vantara Labs’ machine learning model management provides improved algorithmic transparency and automation so application teams can focus their efforts on innovating rapidly without risking model deterioration.”
The plug-ins are currently unsupported and will be available for testing.