Now is the Time to Shift to Disaster Recovery as a Cloud-Based Service
This is part of Solutions Review’s Premium Content Series, a collection of contributed columns written by industry experts in maturing software categories. In this submission, JetStream Software‘s President and Co-Founder Rich Petersen makes the case for why organizations should shift to disaster recovery as a cloud-based service right now.
With the increased pace of digital transformation, it’s more important than ever to plan and prepare for disaster recovery and business continuity (DR/BC). At the same time, IT leaders are increasingly looking to major cloud providers to offer data storage and systems recovery environments for Disaster Recovery as a Service (DRaaS).
It’s useful to consider why this might be the right time to shift to DRaaS in the cloud. With the advance of the software-defined data center (SDDC) over the past decade, most on-premises data centers have virtualized compute, storage and network functions. At the same time, VMware has developed partnerships with all the major public cloud providers as well as thousands of regional managed service providers (MSPs). So IT leaders running production workloads on VMware on-premises can partner with the cloud provider to ensure a comparable VMware environment will be available as a failover destination in the cloud.
The shift from DIY DR to DRaaS entails important considerations, including infrastructure costs, operational changes, business continuity service levels, and vendor accountability. Thanks to recent developments, the public cloud is often the best option for enterprise DRaaS, but IT and business decision-makers should make the shift with a clear understanding of what’s required — and what’s possible — with public cloud DRaaS.
The financial advantages of DRaaS in the public cloud stem from the fact that cloud data storage can be quite economical, and in many cases, compute and network resources can be provisioned for failover only when needed, reducing the cost of recovery site resources. This is a key difference that can make cloud DRaaS significantly more cost-effective than on-premises DR, but the risks and tradeoffs of this strategy must be fully understood to make sure that one emergency (a DR incident) doesn’t result in a second emergency (a lengthy interruption to operations).
A common best practice in cloud DRaaS is to maintain a “pilot light” cluster in the cloud comprising a small number of hosts. This pilot light cluster ensures that essential configurations (DNS, DHCP, etc.) are already defined and available before the cluster is needed for failover. The hosts in the pilot light cluster can be licensed as reserved instances, typically with a one- to three-year commitment. If additional hosts are required in the event of a failover, the additional hosts can be added to the cluster relatively quickly and are purchased by the hour or by the minute. Once the primary site is restored and VMs and their data return to their original environment, the short-term hosts can be removed from the cluster.
A key lesson that enterprises have learned when moving from on-premises software licensed from major vendors to the “same” software provided as a service from a public cloud provider is that there will be differences as to what can and can’t be done in the cloud. These differences can be thought of in terms of functionality, administration, and infrastructure requirements.
First, consider functionality. A software platform provided as a cloud-based service can be configured, but unlike an on-premises deployment, cloud software services cannot be highly customized. What does this mean for DR? Basically, it means that the features and functionality available to your systems when you recover them in the cloud may be slightly different from your on-premises data center. The most noteworthy difference is that the VMware Cloud Foundation incorporates vSAN and NSX-T in addition to vCenter and vSphere. If your on-premises environment doesn’t include VMware’s storage and network virtualization capabilities, you will have to prepare for running with them.
Regarding infrastructure, it’s important to remember that the cloud offers economical data storage that’s effectively limitless as well as compute and network resources that can be provisioned on demand. That makes cloud-based DRaaS a potential game-changer in terms of cost savings. However, relying on the cloud for DRaaS may require changes to the protected on-premises data center and the network connection between it and the cloud. Fundamentally, DR/BC technologies that capture data for replication in real time (in contrast to intermittent snapshots) will place additional activity on the compute, storage and network infrastructure. To preserve the same level of application performance, additional compute, memory and low latency storage may be needed. The ideal resource allocation can typically be determined through tools that observe the IO activity of the protected systems. Additionally, the network connection between the protected data center and the cloud data center will be carrying roughly as much data as is written locally. It’s quite possible that a dedicated, high-bandwidth network connection will be required to reduce the risk of a replication bottleneck.
Finally, consider administration. The advantage of using a platform like VMware for cloud-based DRaaS is that it’s familiar to the IT administrator. The downside, however, is that the administrator will not have the same privileges and abilities in the cloud provider’s VMware environment as in the on-premises data center. This is similar to the distinction between the ability to both configure and customize software on-premises and the limitation on customization of a cloud-based service. For very sound security reasons, network administration will be limited in some respects, as will access to host servers and storage. With a focus on maintaining security and operational consistency on a global scale, the largest cloud providers will likely never support hardware-dependent capabilities like raw device mapping or array-based replication. However, smaller VMware cloud service partners might be more accommodating to special requests.
When you partner with a major public cloud vendor for disaster infrastructure and operations, many things should remain the same. The service levels — the recovery point objectives (RPOs) and recovery time objectives (RTOs) — for systems should not be reduced. Thorough, non-disruptive failover testing should be conducted on a regular basis. The performance and operations of the protected systems should not be disturbed by the capture and replication of their data to the cloud. And all recovery data and metadata should be stored securely and immutably.
Today, it’s easier than ever to make the shift from managing DR on your own to relying on a cloud provider’s DRaaS solution. Nevertheless, there are important considerations to keep in mind when making the transition. One thing that should not change is the business priorities that drive your DR strategy. Every organization will have systems and operations of varying levels of criticality. If operational interruption to certain systems poses a threat to your core business
and/or the health and safety of your employees and stakeholders, then you should require the same level of protection for those systems from the DRaaS offering.