Architect a disaster recovery for SQL Server on AWS: Part 1

By mullaned2002

July 29, 2021

872

In today’s world, it’s just a matter of time before disaster happens, and when it happens it’s essential to recover your SQL Server databases and bring the systems online with minimal data loss and downtime. To respond to and recover from an outage of SQL Server database access, high availability (HA) and disaster recovery (DR) solutions are utilized. High availability efforts entail what to do to prevent an outage, disaster recovery addresses what is done to re-establish high availability after an outage. AWS can helps you lower the cost and complexity of a HA and DR solutions for SQL Server workloads.

Many customers seek our guidance to choose the right HA or DR solution for SQL Server on AWS, but there is no one-size-fits-all solution. There are trade-offs involving potential for data loss, time to recovery, operational complexity, and operational cost. In some cases, you can even combine these technologies to implement a solution that has both HA and DR capabilities.

For many general-purpose use cases, customers can work with Amazon Relational Database Service (Amazon RDS) for SQL Server which is a managed service, that makes it easier to setup, operate and scale SQL Server on AWS. Customers can also run and manage their own SQL Server databases on Amazon Elastic Compute Cloud (Amazon EC2).

In this blog series, we compare and contrast the DR solutions available for SQL Server on Amazon EC2 and help you to understand the nature of these trade-offs, and the cost and complexity of implementing DR for SQL Server workloads on AWS. We plan to write a similar series for Amazon RDS for SQL Server, you can learn more about Amazon RDS resilience in the User Guide.

This post introduces the technical terminology and DR solutions available for different SQL Server versions and editions. In part 2, we explore SQL Server backup and restore, SQL Server log shipping, and SQL Server database mirroring. In part 3, we dive deep on SQL Server Always on Failover Cluster Instances (FCIs) and Always On availability groups. Finally, in part 4, we introduce AWS Database Migration Service (AWS DMS), CloudEndure Disaster Recovery, and we wrap up all technologies discussed in the series.

Technical terminology

We use the following terms throughout this series of posts:

High availability (HA) – HA protects against data center, Availability Zone, server, network, and storage subsystem failures to keep your business running without downtime.
Disaster recovery (DR) – DR is about preparing for and recovering from a disaster. Any event that has a negative impact on your business continuity or finances could be termed a disaster. This could be hardware or software failure, a network outage, a power outage, physical damage to a building like fire or flooding, human error, or some other disruptive incident.
Recovery Time Objective (RTO) – RTO is the maximum acceptable delay (defined by your organization) between the interruption of service and restoration of service. This determines what is considered an acceptable time window when service is unavailable.
Recovery Point Objective (RPO) – RPO is the maximum acceptable amount of time (defined by your organization) since the last data recovery point. This determines what is considered an acceptable loss of data between the last recovery point and the interruption of service.
Failover – The process where the service is automatically transitioned to standby node in case the primary node is no longer accessible to applications. Failover does not require any user intervention.
Switchover – In contrast to failover, a switchover is the process where the service is intentionally transitioned from primary node to a standby node (new primary node) for testing and scheduled maintenance. Switchover does require user intervention.
Failback – Failback is the reverse of failover or switchover once the original node recovered to healthy state and ready to resume service. Failback is always user initiated.
Active-passive – With an active-passive cluster configuration where service is always delivered by a single node (primary node) unless there is an outage. The service transfers to dedicated standby node(s) when outage is detected. This transfer of service could be via automated failover or user-initiated switchover
Active-active – With an active-active cluster configuration, where service is delivered by both primary and secondary nodes at the same time. In this configuration doesn’t require a failover if only one of the active nodes becomes unavailable. If one active node is unavailable, the other active nodes continues to be available.
Pilot light – Replicate your data from one region to another and provision a copy of your core workload infrastructure. Resources required to support data replication and backup such as databases and object storage are always on. Other elements such as application servers are loaded with application code and configurations, but are switched off and are only used during testing or when Disaster Recovery failover is invoked.
Warm standby – A warm standby maintains a scaled-down but fully functional version of your workload always running in the DR Region. Business-critical systems are fully duplicated and are always on, but with a scaled-down fleet. When the time comes for recovery, the system is scaled up quickly to handle the production load. The more scaled up the warm standby is, the lower RTO and control plane reliance. When scaled up to full scale, this is known as a hot standby.
Cold standby – A cold standby maintains a redundancy of your primary node. The standby node is in shutdown mode and It needs to be turned on manually to keep sync with the primary node.
Synchronous replication – Replication is copying of data changes from one database to another database. Most synchronous replication products write data to primary storage and the replica simultaneously. As such, the primary copy and the replica should always remain synchronized.
Asynchronous replication – Asynchronous replication products copy the data to the replica after the data is already written to the primary storage. The replication process runs with minimum latency in relation to a secondary replica. If the primary is configured as asynchronous mode, it only allows manual failover. In case of a failover event, some data loss may occur.

The following table summarizes what you should expect from HA and DR.

High Availability
Disaster Recovery
Failover
Automatic
Manual
Failback
Automatic
Manual
Replication
Synchronous
Asynchronous
Goal/Intention
Retaining Service/Service Availability
Retaining Data/Service Continuity

SQL Server version and edition requirements

SQL Server installation requirements vary based on your application needs. The different editions of SQL Server accommodate the unique performance, runtime, and price requirements of organizations and individuals. The SQL Server components that you install also depend on your specific requirements. The following tables help you understand how to make the best choice among the editions and components available in SQL Server.

The following table summarizes DR solutions by SQL Server version. We will explore these options in this blog series.

Solution
SQL 2008
SQL 2008 R2
SQL 2012
SQL 2014
SQL 2016
SQL 2017
SQL 2019
SQL Server Backup and Restore
Yes
Yes
Yes
Yes
Yes
Yes
Yes
SQL Server Log Shipping
Yes
Yes
Yes
Yes
Yes
Yes
Yes
SQL Server Database Mirroring (Deprecated)
Yes
Yes
Yes
Yes
Yes
Yes
Yes
SQL Server Always On Failover Cluster Instances
Yes
Yes
Yes
Yes
Yes
Yes
Yes
SQL Server Always On Availability Groups
No
No
Yes (4 replicas)
Yes (8 replicas)
Yes (8 replicas)
Yes (8 replicas)
Yes (8 replicas)
SQL Server Distributed Always On Availability Groups
No
No
No
No
Yes (max 18 replicas between 2 AGs)
Yes (max 18 replicas between 2 AGs)
Yes (max 18 replicas between 2 AGs)
AWS DMS
Yes
Yes
Yes
Yes
Yes
Yes
Yes
CloudEndure Disaster Recovery
Yes
Yes
Yes
Yes
Yes
Yes
Yes

The following table summarizes DR solutions by SQL Server edition.

Solution
HA
DR
Enterprise Edition
Standard Edition
Web Edition
SQL Server Backup and Restore
No
Yes
Yes
Yes
Yes
SQL Server Log Shipping
No
Yes
Yes
Yes
Yes
SQL Server Database Mirroring (Deprecated)
Yes
Yes
Yes
Yes (Full safety only)
Yes (witness only)
SQL Server Always On Failover Cluster Instances
Yes
No
Yes (max 16 replicas)
Yes (2 replicas)
No
SQL Server Always On Availability Groups
Yes
Yes
Yes (max 8 replicas)
Yes (2 replicas as Basic AG)
No
SQL Server Distributed Always On Availability Groups
Yes
Yes
Yes (max 18 replicas between 2 AGs)
No
No
AWS DMS
No
Yes
Yes
Yes
Yes
CloudEndure Disaster Recovery
No
Yes
Yes
Yes
Yes

Summary

The choice of disaster recovery solution depends on your requirements and budget, as well as on the version and edition of SQL Server product you install. In some use cases, it’s even beneficial to combine multiple DR technologies.

In this post, we discussed DR terminology, and also SQL versions and editions to implement DR capabilities with AWS. In our next post, we talk through SQL backup and restore, SQL log shipping, and SQL database mirroring technologies for DR use cases.

About the Author

Ganapathi Varma Chekuri is a Database Specialist Solutions Architect at AWS. Ganapathi works with AWS customers providing technical assistance and designing customer solutions on database projects, helping them to migrate and modernize their existing databases to AWS cloud.

Baris Furtinalar is a Senior Solutions Architect who is part of specialists in Microsoft architectures team at AWS. He is passionate about cloud computing and he believes this shift to the cloud helps companies transform their businesses, gain agility, and improve operational responsiveness. He comes from a diverse background including SQL database administration, virtualization and system security. He designed, implemented and supported Windows/SQL server deployments since 2000.

Architect a disaster recovery for SQL Server on AWS: Part 1

Technical terminology

SQL Server version and edition requirements

Summary

About the Author

Monitor query plans for Amazon Aurora PostgreSQL

Handle tables without primary keys while creating Amazon Aurora PostgreSQL zero-ETL integrations with Amazon Redshift

Handle tables without primary keys while creating Amazon Aurora MySQL or Amazon RDS for MySQL zero-ETL integrations with Amazon Redshift

LEAVE A REPLY Cancel reply

Most Popular

The overwhelmed person’s guide to Google Cloud: week of April 11

Introducing automatic training for solutions in Amazon Personalize

Monitor query plans for Amazon Aurora PostgreSQL

Cloud CISO Perspectives: 20 major security announcements from Next ‘24

Recent Comments

EDITOR PICKS

Exploring the Click Element Variable in Google Tag Manager

How to track events with Google Tag Manager and Google Analytics

Data Layer Variable in GTM: What, Why, and Where?

POPULAR POSTS

Digital transformation starts with living systems leadership

Export Google Cloud data into Elasticsearch with Dataflow templates

Introducing Google Cloud Deploy: Managed continuous delivery to GKE

POPULAR CATEGORY