Ad Image

Five Strategies for Achieving Application High Availability

Solutions Review’s Contributed Content Series is a collection of contributed articles written by thought leaders in enterprise tech. In this feature, SIOS Technology‘s Todd Doane offers five strategies for achieving application high availability.

Business-critical applications, such as SAP, S/4 HANA, SQL Server, and MaxDB, serve as the backbone of many organizations. Any downtime can result in severe consequences, such as lost revenue, unproductive employees, and dissatisfied customers. To address these concerns, implementing robust high availability (HA) and disaster recovery (DR) strategies is essential. In this article, I will discuss five case studies from different industries, highlighting the unique challenges each faced and the strategies employed to achieve application high availability.

Leading Automobile Manufacturer

When a leading automobile manufacturer revamped its warehouse management system in three locations with new cloud-based systems they needed a new way to provide high availability without adding complexity or slowing performance.

Each warehouse relied on a management system that handled orders and inventory for automobile parts and accessories sold by dealers. The manufacturer must ensure that any parts or accessories ordered within a defined acceptance period are delivered to the dealerships the next day. Therefore, even short periods of downtime for mission-critical systems could have a significant impact on the business. To ensure next-day delivery of an order entered at 4:29 PM, the warehouse management system has to process and display it by 4:40 PM so that it can be put on the last truck or flight of the day. If a problem arises, the system needs to recover in less than 10 minutes.

However, the legacy system required IT personnel to manually stop the system and switch operations to redundant hardware until the problem was fixed – a process requiring two to four hours of an IT person’s time. The company needed new systems that would eliminate the drain on IT resources and reduce the negative impact on operations. They decided to migrate its legacy on-prem system to the public cloud and implemented a high availability solution that automatically fails over from the primary server node to a secondary node, maintaining uninterrupted operations.

The company not only achieved a resilient infrastructure but the benefits of moving the warehouse management system to the cloud while ensuring high availability were evident in response to the pandemic. Having their systems in the cloud enabled them to manage the systems remotely.

Large Financial Services Firm

One of the oldest financial services firms in China operates across 14 countries including Shanghai, Hong Kong, New York, London, Tokyo, and Singapore and has a stable customer base of approximately 18 million customers. The company relied on securities trading applications based on Oracle Database. While the firm’s IT team was backing up these applications and database frequently, they could not recover operations quickly in the event of a failure or disaster.

The firm decided to implement an HA clustering solution that would ensure they could reliably meet their service level agreements (SLAs) for HA as well as their stringent recovery time and recovery point objectives (RTO, RPO). The firm created a two-node cluster on physical servers using clustering software, which monitors the entire application stack – network, storage, OS, and application. In the event of a failover, the software orchestrated the failover of application operation to the secondary node in the cluster. Moreover, application-aware modules in the software simplified the complexity of configuring a cluster for Linux environments.

The firm has been relying on the clustering solution for many years now, consistently meeting their availability SLAs.

Bonfiglioli

With more than 3,600 employees in locations around the world, a leading Italian manufacturing company relies on applications such as its SAP ERP application to keep business operations running smoothly. Since most of their applications run on the Microsoft Windows operating system, they used guest-level Windows Server failover clustering (WSFC) in their VMware environment to provide HA and DR.

The company’s IT team implemented a program to move part of its IT operations into the Microsoft Azure cloud. Another aspect was to leverage Azure as a disaster recovery site. A key component of their migration planning was ensuring that they could still meet strict SLAs for application performance and availability in the cloud. Operations in their on-prem environment were protected by VMware clustering that allows WSFC to manage failover of operations to a secondary server in the event of a system failure. Providing the same protection in the cloud, however, posed a challenge; Guest-clustering with shared-bus disks is not a viable solution in a public cloud.

To protect their sensitive, business-critical SAP ERP system, Bonfiglioli chose an SAP-certified HA/DR solution, which was simple to install, transparent to the OS, and cost-effective. Creating a cluster in VMware using Raw Device Mapping and shared-bus disks (RDM) is challenging and creates limitations for backing up the VMs. The new HA/DR software removed these barriers by enabling Bonfiglioli to create a cluster environment without the need for RDM. They simply created a two-node cluster in VMware and added the software to synchronize storage in each cluster instance. The synchronized storage appears to WSFC as a single shared storage disk in their on-premises environment.

Leading Beverage Manufacturer

A leading Hong Kong-based beverage manufacturer relies on an SAP ERP system running in a Red Hat Linux environment. They used a large Storage Area Network (SAN) for data storage. In their on-premises data center, the company provided uptime protection for this system using data replication and backups of the SAN.

The company’s IT department determined that they could achieve true HA (99.99% uptime), DR, and cost savings by migrating to the cloud and using failover clustering to protect their critical SAP system.  However, they realized that SAN is not practical in some clouds and is not available in others. The company chose to move its SAP environment to Amazon EC2 and use a clustering solution that is certified by SAP for both NetWeaver and DB2 and is fully tested and supported on Red Hat Enterprise and other distributions of Linux.

The clustering software they chose enabled SANless failover clustering to provide full HA and DR for SAP. The software’s unique modules that provide application-specific functionality automated configuration steps and ensured failover orchestration maintain application best practices.

US National Capital Region

A government agency in the National Capital Region (NCR) of the United States created a data exchange platform designed to provide emergency services agencies with secure access to data and applications, including a system that connects computer aided dispatch (CAD) systems called CAD-to-CAD (C2C).

In the earlier stages of the project, the C2C Exchange database used Microsoft  Always On Availability Groups to protect the SQL Server Enterprise Edition that the system ran on. As the project expanded, the NCR IT team migrated the C2C platform to the Azure Cloud for added flexibility and improved service levels. However, the service levels guaranteed by cloud vendors ensured hardware operability but they did not include application availability.

NCR implemented a cost-effective HA clustering software that allowed them to protect their C2C Exchange application availability using SQL Server Standard Edition, reducing the higher licensing costs associated with the Enterprise Edition.

The software used host-based, block-level replication to synchronize local storage on all database cluster nodes. If Windows Server Failover Clustering detects an issue the application operation is automatically moved to a secondary cluster node with no manual intervention required. Since deploying the clustering solution, there have been no downtime issues involving data loss and C2C Exchange operation has continued without end users being impacted by a prolonged reduction in service.

In conclusion, the journey towards achieving high availability for critical applications varies across industries. By examining multiple case studies, organizations can gain insights into the challenges specific to their domain and create effective HA/DR strategies accordingly.

Share This

Related Posts