Google made news this week with the launch of Google Cloud Dataproc, a new infrastructure as a service offering that makes Spark and Hadoop easier, faster, and cheaper. The platform was launched in beta on Wednesday. The service lets users take advantage of open source data tools for batch processing, querying, streaming, and machine learning. In addition, Cloud Dataproc allows for swift cluster creation with easy-to-manage tools that will help to save organizations money by turning off clusters that are not being used.
James Malone, Product Manager at Google said this in a recent blog post about Cloud Dataproc: “Working with large datasets requires powerful tools, but too often those tools add new layers of complexity. To use your data efficiently, you need to minimize the time from data capture to insights. But concerns about deployment, scaling, monitoring, utilization, and cost can get in the way of what matters most: your data. With more data being generated each day, you have less time to peel back the layers of complexity around the tools you rely on for success. We think using powerful data tools should be easy as 1-2-3.”
According to Google: “Cloud Dataproc minimizes the time you spend on administration and management.” Some of the features that make Cloud Dataproc intriguing compared to other competing cloud services include:
Fast and Scalable Data Processing
Cloud Dataproc clusters can be created quickly, resized at any time, and can integrate between three and hundreds of nodes and many machine types to alleviate any concerns with data pipelines outgrowing clusters. Users will have more time to focus on the insights and less on the infrastructure with each cluster action taking less than 90 seconds.
Google Cloud Dataproc has a low-cost and easy to understand price structure which is based on actual use and is measured per minute. In addition, Cloud Dataproc clusters are equipped to include preemptible instances with lower computing prices, which give users powerful clusters at a reduced cost.
Open Source Ecosystem
Spark and Hadoop ecosystem provides tools, libraries, and documents that you can leverage with the Cloud Dataproc platform. Users can get started without needing to learn new tools or APIs because the platform offers frequently updated and native versions of Spark, Hadoop, Pig, and Hive. Conversely, existing projects or ETL pipelines can be moved without redevelopment.
For more information, or to read Google’s introductory post about Google Cloud Dataproc, click here.
- 2023 Big Data Engineer Salary Expectations in the United States - May 12, 2023
- 2023 ETL Developer Salary Expectations in the United States - May 12, 2023
- 2023 Data Integration Engineer Salary Expectations in the US - May 12, 2023