Minimizing Downtime in the Modern Data Protection Era
This is part of Solutions Review’s Premium Content Series, a collection of contributed columns written by industry experts in maturing software categories. In this submission, Dave Bermingham of SIOS Technology outlines how IT professionals can minimize downtime in the modern data protection era.
In today’s data-driven economy, IT is under pressure from end-users to deliver near-continuous access to their organization’s data and applications. While there are more options than ever before for providing high availability (HA) and disaster recovery (DR) protection for data and applications, there is still a great deal of confusion about what is needed. Let’s look at how HA/DR has evolved over the years and clarify what is necessary to protect today’s business-critical applications and data in on-prem, cloud, and hybrid cloud environments.
The Evolution of HA/DR
In IT terms, application and data protection has always been based on eliminating single points of failure (SPoF) through redundancy. That is, ensuring that there is more than one of every vital hardware, software, and networking component required to run an application and access data. If one component fails, the secondary component is ready to take over, minimizing downtime and data loss.
For many years, IT has protected the most business-critical applications using failover cluster configurations. In these configurations, two or more server nodes – designated as the primary node and the secondary node(s) – are connected to shared storage (typically a SAN). Clustering software monitors the health of the application and, if an issue arises, moves the operation over to another node in the cluster. Since both the primary and secondary nodes were connected to the same shared storage, application operations can continue with identical data.
While this configuration is still used today, it poses several challenges that have led to newer HA clustering options. The first challenge is that shared storage poses a SPoF risk. If the SAN or other shared storage fails or is damaged or rendered inaccessible, all cluster nodes are inoperable. Second, SAN storage can be very expensive and require specialized skills to maintain and manage. Third, in Linux environments creating a failover cluster can be very complex and manual, requiring custom scripting to ensure the application failover is orchestrated according to best practices.
Because of these limitations, many companies kept their most critical applications and databases in their on-premises data centers where they could be protected with traditional high availability clustering instead of moving to the cloud.
High Availability and Disaster Recovery in Modern Data Protection Infrastructure
Today, innovations have been introduced that enable IT to create a failover clustering environment that provides both HA and DR in on-prem, cloud, multi-cloud, and hybrid cloud environments. Instead of shared storage, each cluster node is connected to its own local storage. Efficient, host-based, block-level replication ensures the storage on the primary node stays identical to the storage on every other node in the cluster. The replication software integrates with standard clustering software in both Windows and Linux environments.
Cluster nodes can be located in geographically separated locations for DR, and replication software can use synchronous or asynchronous replication depending on the latency requirements required by the database or application.
This SANless or shared-nothing clustering configuration not only eliminates the SPoF risk of shared storage, but also allows companies to migrate mission-critical applications, such as SQL Server, SAP, HANA, Oracle, and others to the cloud without giving up the high availability and DR protection they had on-premises.
Another innovation is in the area of application-aware clustering software. This software includes application-specific recovery kits that monitor the entire application stack. This clustering software ensures that complex applications and databases, such as SAP HANA and required services failover according to best practices for optimal performance and failover reliability. Advanced solutions will automate many configuration steps and actually validate your inputs to prevent misconfigurations that could put applications and data at risk during and after the failover process. These options often offer greater configuration flexibility and failover reliability than clustering solutions from OS vendors.
Role of Cloud in Modern Data Protection
With advances in public cloud technology, confusion persists about the role of the cloud in HA/DR and whether HA/DR clustering is needed in the cloud. Note that while public cloud vendors offer service level agreements (SLAs) that guarantee their VMs will be accessible, those SLAs only cover hardware and networking. There are many conditions, such as software compatibility, resource contention, and others where VMs are fully operational, but applications and data are “down.” For this reason, IT teams need to implement high availability protection for data and applications, even in public cloud environments.
The good news is that advanced application-aware clustering and replication software gives companies the flexibility to use the cloud in various new ways for HA/DR. For example, you may create a cluster with nodes in different cloud availability zones for added protection from local disasters. Or create a two-node cluster in the cloud and replicate data to a third node in a different cloud region for region-wide DR without the cost of a physical DR site.
In today’s IT infrastructures that combine on-prem, cloud, hybrid, and multi-cloud environments, it is crucial to choose high availability and disaster recovery solutions that enable configuration flexibility and application-aware intelligence.