How to Move Big Data in the Cloud

By Tim King , Executive Editor at Solutions Review
Best Practices,

Cloud Data

By Sarah Lahav

Due to the seemingly infinite capacity and elasticity of large cloud service providers, and their emerging focus beyond infrastructure into Big Data services, it’s now more appealing than ever for enterprises to look at swapping their on-premise Big Data capabilities for cloud-based data services.

But how do companies move their Big Data initiatives to the cloud? Firstly, there are two main considerations to take into account:

Working out how and where the processing will run
How to get your data near to the processing capacity so that ideally both application and data are in the cloud (bar any issues related to sensitivity)

There are three common options for processing:

1. Roll your own

If you have specialist Big Data needs and a team of expert data scientists and Big Data administrators, then you might choose to create your own Big Data implementation in the cloud. In AWS, this could be deploying your own virtual machines, installing the software, and connecting network and data stores. This takes time and a certain skillset to deploy and manage, and should be reserved for special cases – as it’s the most costly and slowest way to do Big Data in the cloud.

2. Use a specialist provider on top of the cloud

You can leverage a cloud Big Data specialist such as Cloudera and Hortonworks, who can help you build and run your cloud-based Big Data implementation. They provide everything you need across the Big Data lifecycle. These providers essentially speed up the “roll your own” approach such that you get there faster than you could by yourself.

Widget not in any sidebars

3. Use the cloud service provider’s data services

If you are like most of the corporate world that wants all the upside of Big Data without the deployment downsides, then you can immediately “pass go” by leveraging the cloud service providers’ own Big Data solutions such as AWS Lambda and Azure Machine Learning. This is the cheapest and fastest way to do Big Data in the cloud.

Then there is the issue of the distance between the processing and the data.

Beware of Data Gravity

Dave McCrory, the founder of four virtualization/cloud startups, once described “data gravity” as a restrictive force that gets stronger as data gets larger. This force makes data heavy to move (the time and cost) and it also attracts applications that increase the gravity further. Big Data is ultimately applications working on very large datasets, so data gravity is a very pertinent issue.

If you have collected large data sets on-premise and you need them to be analyzed by a cloud-based Big Data system, then you have to somehow connect the cloud application to your local data. Thankfully this has been improved somewhat by the use of dedicated, fiber connections between the client network and the cloud service provider. For example, an Azure ExpressRoute connection is a good way to both increase security and to have sufficient bandwidth to transfer large local datasets to the cloud.

Trying to accomplish this over “normal” Internet cloud connections will most likely never work due to the latency and bandwidth constraints, and over a corporate WAN this can also be extremely costly. Other alternatives are to ship storage media to the cloud provider, using facilities such as AWS Import/Export.

Advice on Getting Started

If you are just getting started with Big Data in the cloud, make your first steps easier by using a cloud service provider’s data services. If you then outgrow that service and need to fulfill specialist needs, then consider seeking expert advice on how to do it.

SysAid Technologies‘ first employee, Sarah is now CEO and a vital link between SysAid and its customers since 2003. As CEO, she takes a hands-on role evolving SysAid with the dynamic needs of service managers. Previously, Sarah was VP Customer Relations at SysAid and developed SysAid’s Certification Training program, advancing the teaching methods and training technology that is in place today. Sarah holds a B.Sc. in Industrial Engineering, specializing in Information Technology from The Open University in Israel, and spends her free time with her three beautiful children.

This article was written by Tim King on May 20, 2016

Tim King

Executive Editor

Tim is Solutions Review's Executive Editor and leads coverage on data management and analytics. A 2017 and 2018 Most Influential Business Journalist and 2021 "Who's Who" in Data Management, Tim is a recognized industry thought leader and changemaker. Story? Reach him via email at tking@solutionsreview dot com.

What the AI Impact on Data Engineering Jobs Looks Like Right Now - April 24, 2025
The 17 Best AI Agents for Data Integration to Consider in 2025 - April 22, 2025
What to Expect at Safe Software’s The Peak of Data and AI 2025 May 6-8 - April 17, 2025

Best Practices

How to Move Big Data in the Cloud

Tim King

Executive Editor

Expert Insights

Latest Posts

Categories

Important Links

Useful Pages

How to Move Big Data in the Cloud

Share This

Tags

Tim King

Executive Editor

Related Posts

The Holy Grail of Data Integration Is AI-Driven, Seamless & Secure

Outmaneuvering Tariffs: Navigating Disruption with Data-Driven Resilience

The Great Debate: Will AI Help or Hinder Data Engineering Roles?

Expert Insights

Latest Posts

Follow Solutions Review