How to Choose Between a Cloud Data Warehouse vs. Data Lake

How to Choose Between a Cloud Data Warehouse vs. Data Lake

This is part of Solutions Review’s Premium Content Series, a collection of contributed columns written by industry experts in maturing software categories. In this submission, Next Pathway CEO Chetan Mathur offers advice on how to choose between a cloud data warehouse vs. data lake with key points to consider.

SR Premium ContentBy now most organizations have adopted a cloud-first policy. The use of public cloud computing is essential to business, and this has never been more true as companies spend more time and effort on digital transformation initiatives. However, the choice of which Cloud Data Platform (CDP) to select can be daunting. Cloud providers are spending millions to promote their features and capabilities, while simultaneously making it harder to differentiate one from the other. Selecting a cloud platform requires serious consideration, as the migration effort (to move legacy data and code) isn’t trivial.

Cloud Data Platforms such as Snowflake and Databricks offer SaaS models, which provide cloud-based scalability, flexibility, and usage-based pricing. These platforms are purpose-built with specific technology riding on top of the cloud. Both have considerable features and benefits. How does an organization select the most appropriate cloud data platform? Most likely your approach and choice will be based on your usage patterns, data volumes, workloads, and data strategies. It may also include your future plans for executing data analytics and how you plan to monetize your digital products and assets.

Let’s take a closer look at Snowflake and Databricks as an example. These two platforms essentially provide the same services, but comparing them is not apples to apples. According to eWeek’s 2022 reviewers report, Snowflake provides tremendous data analytics power and is easier to use, but does not handle unstructured data as well. On the other hand, Databricks has unlimited scalability, can handle unstructured and real-time data, but requires extensive skill and expertise to manage complex AI and data science use cases. It should be mentioned both platforms continue to evolve and improve and in time many of these gaps may narrow. Regardless of which platform you select, both will require a commitment in terms of cost, time and resources.

At Next Pathway, we build software that automates the translation and migration of both legacy code and data to all cloud platforms and data cloud platforms. We support all varieties of cloud targets, so when we are asked for our opinion, we advise our clients to answer some basic questions:

  1. What type of data do you have?
  2. What type of applications are you running?
  3. Where is this data (and code) stored in your organization?
  4. What do you want to do with your data (now and in the near term)?
  5. What type of in-house skills do you have available to assist in your cloud deployment?

The answer to these questions will go a long way in helping you decide between a cloud data warehouse and a cloud data lake. Keep in mind that moving to a cloud data platform is a modernization of your data systems. The most efficient path is to first ‘translate’ your existing legacy code and data for deployment to the cloud, and then roadmap and plan the end-state transformation to leverage cloud-native features available in your choice of CDP, over time.

This approach allows organizations to avoid vendor lock-in and focus on the most beneficial use cases up front without making an ‘all-in’ commitment to one particular platform and technology. This point cannot be underscored enough; keep your cloud initiative focused and specific, and build upon your success.

We are seeing companies select both cloud data warehouses and cloud data lakes when migrating their enterprise data warehouses and data lakes to cloud data platforms. Recently, we helped a client migrate its Enterprise Data Hub to Databricks. In this case, the client selected an initial pilot to demonstrate the efficacy of moving its Hadoop Platform to Databricks Notebooks, using Next Pathway’s automated translation tool (SHIFT).

This approach enabled the client to quickly migrate its legacy code to Databricks, without the major cost and risk of rewriting or transforming its code base, while taking advantage of most of the benefits of the cloud. This also enabled the client to plan the next steps and maintain flexibility, while carefully assessing the pros and cons of migrating to a CDP. Some optimization and transformation were required in this case, but the decision to rewrite the code was based on unique or specific requirements to improve performance and boost cloud benefits. In this way, clients can leverage the cloud without the cost/risk of transformation, but also target specific cloud features to the benefit of the organization.

Chetan Mathur
Follow
Latest posts by Chetan Mathur (see all)