The Era of the Data Lake Is Over: Think Hybrid Data Cloud
This is part of Solutions Review’s Premium Content Series, a collection of contributed columns written by industry experts in maturing software categories. In this submission, Cloudera Field CTO and Cyber Security Lead Carolyn Duby explains why the era of the data lake is being replaced by hybrid data cloud.
According to Mordor Intelligence, the market for data lakes, valued at $3.74 billion in 2020, is expected to reach $17.60 billion by 2026. However, companies relying solely on a data lake strategy will eventually face critical limitations on their agility and ability to innovate.
While data lakes can be an easy and cost-effective way to aggregate data from multiple silos and make it accessible to analysts, problems with this approach include data quality, lineage, governance, and security challenges.
- Data sets can quickly become dated, and lineage is difficult to track.
- Data lakes are effective for batch processing, not real-time analytics.
- Managing access to sensitive and personally identifiable information (PII) is very challenging, especially across multiple clouds with different security and governance processes.
- Data in a data lake can’t easily be moved from one cloud to another to optimize for workload capability or cost.
- Data can’t easily be moved between public clouds and private clouds or between clouds and on-premises systems to meet compliance requirements.
As a result, the data in a data lake essentially becomes a new type of silo, making it extremely challenging to build use cases involving data sets from multiple locations, which limits and slows innovation.
Any data strategy designed for agility and innovation must have hybrid at its core. According to IDC, hybrid is a key factor driving the cloud market: “Hybrid cloud has become central to successful digital transformation efforts by defining an IT architectural approach, an IT investment strategy, and an IT staffing model that ensures the enterprise can achieve the optimal balance across dimensions without sacrificing performance, reliability, or control.”
In an ideal data environment, we could simply describe a workload we need to run, and the data platform would automatically determine where to run it to maximize performance and cost while ensuring data security and compliance. Unfortunately, such a solution doesn’t yet exist, but the foundation for such a system is a “hybrid data cloud” that makes it easy to move data sets and workloads between any location – multiple public and private clouds and on-premises systems – and centralizes the management of all this data. A hybrid data cloud lets enterprises start building the automated data infrastructure of the future today. Here are the key considerations for building a hybrid data cloud environment.
Ensure Security and Governance Everywhere
A hybrid data cloud enables a write-once/run-anywhere approach to data management. Public cloud security models vary, and data teams should not have to implement a different security model for each cloud. Instead, a hybrid data cloud centralizes security and governance across all environments and audits and monitors user activity and access to meet compliance requirements.
This holistic approach enables organizations to move workloads wherever they want without compromising security and compliance. A U.S. company expanding into the EU faces daunting and expensive data challenges related to the General Data Protection Regulation (GDPR). Some data and workloads must be moved to the EU. Some must be stored in the cloud. Some must remain on-prem. A hybrid data cloud offers the flexibility to ensure GDPR compliance by locating data and workloads wherever they need to be.
Move Workloads to Optimize for Cost
According to a 451 report, 57 percent of organizations say hybrid is the organizing principle of their IT environments, and today, the cost of running some workloads in the cloud is leading to cloud repatriation – moving some workloads back on-prem. Further, a particular workload may perform better in one cloud than another, and the cost of running a workload may vary from cloud to cloud and region to region. A hybrid data cloud makes it easy to move workloads to optimize for cost.
Focus on Customer Experience
Today, experience is king, and performance, cost, security, and compliance must all be rolled up into the impact on customers, including internal users. Great experiences often depend on real-time data to impact customers “at the moment.” Retailers want to present customers with personalized offers while they are still in a store, not three days later. This requires not only instant access to data but also multi-function analytics across multiple platforms – CRM, inventory, promotions, etc. A hybrid data cloud makes this possible.
Establish a Framework
Building a hybrid data cloud starts with a framework of requirements. What are the security and compliance issues? What are the cost and performance issues? What workloads must stay on-prem? Which clouds are best for which types of workloads? A multinational bank, for example, may face very different privacy requirements in different regions of the world. These requirements, as well as the requirements for cost and performance, must be fully understood and codified, so a supporting technology platform can begin to assist with or automate balancing all the competing demands to ensure the desired customer experience.
A fully automated platform that optimizes where workloads should run will be great whenever it arrives. In the meantime, you can’t continue relying on legacy data lakes that limit your ability to innovate and accelerate your business. A hybrid data cloud can enable you to transform your business today by letting you move data and workloads where they need to be to optimize for performance, cost, and customer experience while ensuring security and compliance on a global scale.