Expert Reveals 3 High Availability for SQL Server in AWS Options

By Dave Bermingham
Best Practices,

High Availability for SQL Server

Solutions Review’s Expert Insights Series is a collection of contributed articles written by industry experts in enterprise software categories. In this feature, SIOS Technology Corp‘s Dave Bermingham reveals three high availability for SQL Server in AWS options to consider.

Amazon has always been big on choices, and Amazon Web Services (AWS) is no different. When it comes to ensuring the availability of an instance of Microsoft SQL Server running on AWS, you have quite a number of choices – from a Platform-as-a-Service (PaaS) offering, where pretty much everything is managed for you, to an Infrastructure-as-a-Service (IaaS) offering, where AWS manages some aspects of your infrastructure and you manage the rest.

If you want true high availability – meaning that your SQL Server database will be accessible 99.99 percent of the time – your choices narrow in certain ways. The PaaS offering — Amazon RDS – falls out of the running because its service level agreement (SLA) only guarantees an availability level of 99.95 percent. Within the realm of AWS IaaS offerings, though, your options for achieving high availability (HA) widen: Will you create a shared SAN-like data repository using an Amazon FSx file system? Will you use SQL Server’s built-in Availability Groups (AG) functionality? Or, will you opt for door number three and create a SANless cluster? Each choice is designed to ensure that you can access your SQL Server database at least 99.99 percent of the time, but each choice comes with different advantages and drawbacks.

Ensure Accessibility Through Multiple Availability Zones

Each of these options — Amazon FSx, SQL Server Availability Groups, and SANless Clustering — are designed to ensure that you have access to an up-to-date copy of your SQL Server database. This is important because achieving HA in the cloud requires that you have compute and storage infrastructure in at least two separate cloud data centers (or availability zones [AZs], as they are called in AWS-speak). If the infrastructure running in one of those AZs goes offline – for any reason – the infrastructure in another AZ can take over in seconds to continue running your SQL Server-based tasks.

AWS makes it easy to configure the compute and storage resources needed to meet your application and performance demands. The question is, how will you ensure that any updates to your SQL database are captured not only in the active operational database but also in the secondary instances of the database residing in the backup AZs? That’s where understanding the upsides and downsides of Amazon FSx, SQL Server AGs, and SANless Clustering becomes important.

High Availability for SQL Server

Amazon FSx

Amazon FSx makes it easy and cost-effective to launch, run, and scale shared file storage in the cloud. With FSx, you get the reliability of deploying your SQL Server cluster across multiple AZs, and the simplicity and cost efficiency of using a Failover Cluster Instance (FCI) deployment that relies on shared, replicated storage. Being able to use FCI means 50-60% less license costs (as you can now use SQL Standard instead of SQL Enterprise), separating your storage needs from your SQL Server nodes, and providing up to 30% higher performance by leveraging block-level replication instead of database-level transactional replication.

Amazon FSx provides two file systems – Amazon FSx for Windows File Server and Amazon FSx for NetApp ONTAP – that provide a shared storage option like that provided by a SAN and support the SMB protocol. VMs in separate AZs can be configured as a Windows Failover Cluster Instance (FCI), and each node in the FCI connects to a single shared instance of an FSx file system. If a VM actively engaging with a SQL database housed in FSx storage suddenly goes dark, the FCI will fail over to another VM, which will interact with the same SQL database in shared storage.

When configuring an FSx file system, you can deploy it in a single-AZ or multi-AZ. For HA, you should always choose multi-AZ in order to survive the failure of an entire AZ. This configuration has a primary file server in one zone and a secondary file server in the other. In normal circumstances, the active SQL Server cluster node will reside in the same AZ as the primary storage node. However, there is no guarantee that it will always be that way.

SQL Server Availability Groups

SQL Server itself offers another way to replicate data among physically separate infrastructures. Its built-in Availability Group (AG) functionality can automatically replicate user-named SQL databases from one member of an availability group to another. If the active instance of SQL Server suddenly goes offline, the secondary instance in another AZ can take over, interacting with its own up-to-date copy of the database. This avoids the scenario in which a secondary VM might be forced to interact with a copy of the database housed in a different AZ.

There are some potential downsides to relying on SQL Server’s AG functionality. If you use SQL Server Standard Edition, only the “basic” AG functionality is available. That is, you can only replicate a single database to a single remote location. If you have more than one SQL database or you want to replicate to multiple secondary AZs, you’ll need to use the “Always On” AG functionality found in SQL Server Enterprise Edition, which comes at a significant cost. Know too that the AG functionality prior to SQL Server 2022, which introduced Contained AGs, only replicates user-named SQL databases. Earlier versions do not replicate any other data in storage, nor do they replicate the system databases managing jobs, passwords, and other activities—so you’ll need to find another way to replicate this data if you expect to be able to access them in a secondary AZ.

SANless Clustering

Finally, there is a third option for replicating data among the AZs comprising your HA infrastructure: You can create an FCI that relies on what is known as SANless clustering technology. Like Amazon FSx, SANless Clustering makes it easy and cost-effective to launch, run, and scale shared file storage in the cloud. Instead of configuring a cloud FCI that depends on shared storage, though, you can configure one that uses third-party tools to replicate data synchronously from primary to secondary storage across each of the AZs. If the primary VM in the FCI goes dark, the FCI will fail over to secondary infrastructure in a remote AZ, where an up-to-date replica of the active SQL database is ready to go into service.

While this sounds like the approach informing the AG functionality in SQL Server, it differs in significant ways. The SANless clustering software is application agnostic, so it will replicate any data in storage (including the system databases that AG does not replicate). Moreover, unlike the “basic” AG functionality available in SQL Server Standard Edition, the SANless clustering software will replicate any number of SQL databases to any number of targets. If you only need to rely on SQL Server Standard Edition to support your applications, you can continue to do so — while gaining the data replication flexibility that you’d otherwise have to license SQL Server Enterprise Edition to obtain.

As for the potential downsides to using SANless clustering to facilitate data replication among the VMs in your AWS HA infrastructure? It’s not an AWS managed service like FSx, so your administrators must set up and monitor the software supporting the SANless clustering functionality. And, because the SANless clustering functionality is not built into SQL Server the way the AG functionality is, you’ll have to license the solution separately for each of the VMs in your FCI.

This article was written by Dave Bermingham on March 30, 2023

Dave Bermingham

Dave Bermingham is the Director of Customer Success at SIOS Technology. He is recognized within the technology community as a high-availability expert. He has been honored to be elected a Microsoft MVP for the past 12 years: 6 years as a Cluster MVP and 6 years as a Cloud and Datacenter Management MVP. Dave holds numerous technical certifications and over thirty years of IT experience, including in finance, healthcare, and education.

Minimizing Downtime in the Modern Data Protection Era - June 29, 2021

Why Microsoft 365 Backup Is Now a Cyber Resilience Requirement

Best Practices

Expert Reveals 3 High Availability for SQL Server in AWS Options

Ensure Accessibility Through Multiple Availability Zones