Migrate on-premises SQL Server workloads to Amazon RDS Custom for SQL Server using distributed availability groups

By mullaned2002

June 28, 2023

748

In this post, we provide a SQL Server Always On cluster database migration pattern solution to Amazon Relational Database Service (Amazon RDS) Custom for SQL Server using distributed availability groups. This solution helps reduce the migration downtime through continuous data synchronization combined with a failover process. This post is not a high availability and disaster recovery solution; it is for migration purposes only. We recommend not scaling compute or upgrading your RDS instance to avoid host replacement issues during the migration journey.

Amazon RDS Custom is a managed database service for legacy, custom, and packaged applications that require access to the underlying OS and DB environment. Amazon RDS Custom for SQL Server automates the setup, operation, and scaling of databases in the cloud, while granting access to the database and underlying operating system to configure settings, install drivers, and enable native features to help you meet the dependent application’s requirements.

A distributed availability group is a special type of availability group that spans two separate availability groups. The availability groups that participate in a distributed availability group don’t need to be in the same location. They can be physical, virtual, on premises, in the public cloud, or anywhere that supports an availability group deployment. This includes cross-domain and even cross-platform—such as between an availability group hosted on Linux and one hosted on Windows. As long as two availability groups can communicate, you can configure a distributed availability group with them.

Solution overview

The solution typically involves 2-nodes, source (on premises) and destination (RDS Custom for SQL Server) clusters. For this post, our source and destination cluster nodes and configurations are represented as follows:

Source cluster primary node – Global primary (GP), referred to as Node1
Source cluster secondary node – Global secondary (GS), referred as Node2
Source cluster availability group – Availability group primary (AG1)
Source cluster Windows Server failover cluster – Windows Server failover cluster primary (WSFC1)
Destination cluster primary node – Forwarder primary (FP, RDS Custom)
Destination cluster availability group – Availability group secondary (AG2)
Destination cluster Windows Server failover cluster – Windows Server failover cluster secondary (WSFC2)

The availability mode between AG1 nodes is synchronous-commit. The availability mode of the distributed availability group between two clusters through GP and FP is asynchronous-commit. Domain Name System (DNS) is used for communication. The failover between AG1 and AG2 through the distributed availability group is manual.

The following are the high-level steps for the solution:

Configure Windows firewall inbound rules and VPC security group rules.
Set up connectivity from on premises to the AWS Cloud.
Set up and configure an on-premises Always On AG1 cluster as primary.
Create RDS Custom for SQL Server (FP).
Configure the RDS Custom AG2 cluster with listener.
Create a distributed availability group between AG1 and AG2.
Join AG2 to the distributed availability group.
Join the database on FP (RDS Custom).
Migrate the on-premises database to Amazon RDS Custom for SQL Server through the distributed availability group.

The following flowchart illustrates the solution workflow.

The following diagram illustrates the solution architecture.

The following diagram illustrates the solution architecture for the multi-node on-premises cluster.

Prerequisites

We assume that you have prior knowledge regarding setting up Always On availability groups. For additional information, refer to What is an Always On availability group. We use SQL Server 2019 Enterprise Edition and Windows Server 2019. In this post, we migrate a database using a distributed availability group. For additional information, refer to Distributed availability groups.

To follow this post, the following prerequisites are required:

The AWS Command Line Interface (AWS CLI) installed and configured
An AWS account and appropriate permissions to interact with resources in your AWS account
Amazon Virtual Private Cloud on AWS

This solution can incur costs in your AWS account. Refer to Amazon RDS Custom for SQL Server Pricing for more information. Make sure you remove the resources when you’re done with the solution.

We recommend that you set up Always On in a non-production instance and run end-to-end validations before you implement this solution in a production environment. It’s also recommended to follow the preceding high-level steps in the same order of sequence.

Configure Windows firewall inbound rules and VPC security group rules

For successful Windows failover cluster creation and communication to happen, you must open the TCP ports 1433, 1434, 4022, 5022, 5023, and 135 at the Windows operating system level and VPC security group level.

For more information about creating inbound port rules in Windows, refer to Create an Inbound Port Rule. To learn more about security groups, refer to Control traffic to your AWS resources using security groups.

Set up connectivity from on premises to the AWS Cloud

It’s important to establish a secure connection from on premises to the AWS Cloud so both can communicate with each other. You can adopt different methodologies to achieve this, for example using AWS Transit Gateway, AWS Direct Connect, or AWS VPN.

Set up and configure an on-premises Always On AG1 cluster as primary

For this post, we create a 2-node Amazon Elastic Compute Cloud (Amazon EC2) SQL Server Always On availability group cluster (Node1 and Node2) using the following quick start to represent the on-premises cluster nodes. If your on-premises cluster is a single-node cluster with an Always On listener setup, proceed to following section. If your on-premises cluster is a single-node workgroup setup, refer to the preceding solution overview for next steps.

Create RDS Custom for SQL Server (FP)

Launch FP (RDS Custom) using the following AWS CLI script and update the parameters accordingly. For more information about how to create an RDS Custom for SQL Server DB instance, refer to Creating and connecting to a DB instance for Amazon RDS Custom for SQL Server.

aws rds create-db-instance
–engine custom-sqlserver-ee
–engine-version 15.00.4073.23.v1
–db-instance-identifier <XXXXX>
–db-instance-class db.m5.xlarge
–db-subnet-group rds-custom-secondary
–master-username admin
–master-user-password <XXXXX>
–backup-retention-period 7
–port 1433
–kms-key-id <kms-key-id>
–custom-iam-instance-profile RDSCustomIAMProfile
–vpc-security-group-ids security-group
–no-publicly-accessible
–region us-east-2

Configure an RDS Custom AG2 cluster with listener

Now let’s configure the Always On cluster. Run the commands in this section in Windows PowerShell 64-bit as an administrator. Note that multiple server restarts will occur as a part of the setup process.

Install the failover clustering feature on FP (RDS Custom)

Run the following command on FP (RDS Custom) to install the Windows failover clustering feature:

Install-WindowsFeature -Name Failover-Clustering –IncludeManagementTools

Configure the DNS suffix list

Run the following commands on FP (RDS Custom) to append the existing DNS suffix list:

$dnsCGSetting = Get-DnsClientGlobalSetting
$dnsCGSetting.SuffixSearchList += “<node1domainname>”

set-DnsClientGlobalSetting -SuffixSearchList $dnsCGSetting.SuffixSearchList
Clear-DnsClientCache

Configure the DNS server for FP (RDS Custom)

Run the following commands on FP (RDS Custom) to set preferred and alternate DNS server addresses.

After you run the Get-DnsClientServerAddress command, get the InterfaceAlias name of the instance (in this example, Ethernet 2 is the InterfaceAlias name) and pass that in the Set-DNSClientServerAddress command. Additionally, pass the preferred and alternate IP address of the Active Directory server:

Get-DnsClientServerAddress
Set-DNSClientServerAddress “Ethernet 2” -ServerAddresses (“Preferred DNS IP Address of AD server”, “Alternate DNS IP address of AD server“)

The following is an example screenshot.

Join FP (RDS Custom) to the on-premises (WSFC1) domain

We join the FP (RDS Custom) to the on-premises domain, which is onprem.com in this example.

Run the following command on FP (RDS Custom) and restart the server. Make sure to persist this change. Refer to Persist your OS-level customization within Amazon RDS Custom for SQL Server using Custom Engine Version (CEV) for more information, in the event of host replacement/scale compute.

Add-Computer -DomainName <onprem.com> -Credential <onpremadmin> -Restart -Force

Run the following commands on FP (RDS Custom) to verify the domain name:

Get-WmiObject -Namespace rootcimv2 -Class Win32_ComputerSystem | Select Domain

Configure the SQL Server log on as a service using a domain user on FP (RDS Custom)

Change the SQL Server service account’s log on as a service to a domain user on FP (RDS Custom) using the following PowerShell code (change the parameters accordingly). Note that this will restart the SQL services automatically. If the PowerShell command isn’t completing, restart the SQL services manually and rerun the command.

[System.Reflection.Assembly]::LoadWithPartialName(‘Microsoft.SqlServer.SqlWmiManagement‘)| Out-Null
$smowmi = New-Object Microsoft.SqlServer.Management.Smo.Wmi.ManagedComputer “<hostname>“
$wmisvc = $smowmi.Services | Where-Object {$_.Name -eq “SQLSERVERAGENT” -or $_.Name -eq “MSSQLSERVER“}
$wmisvc.SetServiceAccount(‘onpremdomainnameservice_accountname‘,’xxx‘)

Run the following commands on FP (RDS Custom) to verify the domain user name on the SQL Server service account:

Get-CIMInstance -Class Win32_Service -filter “Name=’MSSQLSERVER'” |
Select-Object Name,StartName| Sort-Object StartName

Add the SQL Server service account to the local administrator group

Add the SQL Server service account (domain user) to the local administrator’s group. Run the following command on FP (RDS Custom):

Add-LocalGroupMember -Group “Administrators” -Member <”onpremdomainnameservice_accountname”>

Run the following commands on FP (RDS Custom) to verify the domain user is part of the local admin group:

Get-LocalGroupMember -Group “Administrators”

Add secondary IPs on the cluster node elastic network interface

Run the following command on FP (RDS Custom) to add the cluster IP address and SQL Server Always On listener IP address, in the secondary IPs of the corresponding cluster node’s elastic network interface (ENI). Make sure to pick an unused IP address before adding the ENI. For additional information, refer to Multiple IP addresses

aws ec2 assign-private-ip-addresses –network-interface-id “eni-id-of-ec2″ –private-ip-addresses “private ip address of cluster“

aws ec2 assign-private-ip-addresses –network-interface-id “eni-id-of-ec2” –private-ip-addresses “private ip address of listener“

Create an Always On availability group AG2 with listener on FP (RDS Custom)

Now let’s create cluster and configure Always On availability group AG2 on FP (RDS Custom).

Create WSFC2

Run the following commands as the domain admin and PowerShell admin to create WSFC2 in PowerShell on FP (RDS Custom) by providing the cluster name, node name, and static IP:

$clustername = “Enter Cluster Name”
$Nodename = “Enter Node Name”
$StaticAddress = “Enter StaticAddress”
New-Cluster -Name $clustername -Node $Nodename.ToString() -StaticAddress $StaticAddress.ToString()

Enable the Always On availability group feature on FP (RDS Custom)

Run the following commands on FP (RDS Custom), changing the parameters accordingly. Note that the third command restarts SQL services. Additionally, PowerShell may wait longer to complete the command. In that case, restart the SQL services manually to move forward.

Import-Module “SQLPS“
Enable-SqlAlwaysOn -ServerInstance <“hostname”> -Force
get-service -Name MSSQLSERVER,SQLSERVERAGENT|Restart-Service -Force

Set up an endpoint and login in FP (RDS Custom)

Run the following TSQL commands on FP (RDS Custom) to create a cluster endpoint:

CREATE ENDPOINT [Hadr_endpoint]
STATE=STARTED
AS TCP (LISTENER_PORT = 5022, LISTENER_IP = ALL)
FOR DATA_MIRRORING (ROLE = ALL, AUTHENTICATION = WINDOWS NEGOTIATE,
ENCRYPTION = REQUIRED ALGORITHM AES)
GO

Run the following TSQL on FP (RDS Custom) to add the SQL Server service account as login and grant sysadmin privilege:

CREATE LOGIN [onpremdomainnameservice_accountname] FROM WINDOWS;
GO
EXEC master..sp_addsrvrolemember @loginame = N’onpremdomainnameservice_accountname‘, @rolename = N’sysadmin’
GO

Explicitly grant connect privilege on the endpoint to the SQL Server service account using the following command:

GRANT CONNECT ON ENDPOINT::Hadr_endpoint TO [onpremdomainnameservice_accountname];

Create AG2

Run the following TSQL on FP (RDS Custom):

CREATE AVAILABILITY GROUP [<AG2>]
WITH (
AUTOMATED_BACKUP_PREFERENCE = SECONDARY,
FAILURE_CONDITION_LEVEL = 3,
HEALTH_CHECK_TIMEOUT = 600000
)
FOR
REPLICA ON
‘RDSAMAZ-OFRQK90’ WITH
(
ENDPOINT_URL = ‘TCP://RDSAMAZ-OFRQK90:5022’,
AVAILABILITY_MODE = SYNCHRONOUS_COMMIT,
FAILOVER_MODE = AUTOMATIC,
BACKUP_PRIORITY = 30,
SECONDARY_ROLE (ALLOW_CONNECTIONS =NO
))

Create an AG2 listener

Make sure to pre-stage the listener object ahead of time in the active directory and provide the Windows cluster name object$ (in this example, WSFC2$). Create a computer objects permission explicitly for the listener object to be created successfully. For additional information, refer to Configure a listener for an Always On availability group.

Run the following TSQL on FP (RDS Custom):

USE [master]
GO
ALTER AVAILABILITY GROUP [<AG2>]
ADD LISTENER N’< listener_name>‘ (
WITH IP
((N’<listener IP address>‘, N’subnet mask IP‘)
)
, PORT=1433);
GO

Create a distributed availability group between AG1 and AG2

On GP (Node1), run the following command to create a distributed availability group. The file system directory structure of the database files that is created on GS (Node2), and FP (RDS Custom) should be identical to GP (Node1) in order for automatic seeding to work. If they’re different, you must manually prepare the secondary database. For more information, see Prepare a secondary database for an Always On availability group.

Note that for FP (RDS Custom), all SQL Server database files are stored on the D: drive by default, in the D:rdsdbdataDATA directory. If you create or alter the database file location to be anywhere other than the D: drive, then Amazon RDS Custom places the DB instance outside the support perimeter. For details, refer to Troubleshooting DB issues for Amazon RDS Custom.

Modify the AVAILABILITY GROUP and LISTENER_URL values according to your environment:

CREATE AVAILABILITY GROUP [<distributed_ag>]
WITH (DISTRIBUTED)
AVAILABILITY GROUP ON
‘<AG1>‘ WITH
(
LISTENER_URL = ‘tcp://<on_prem_listener_url>:<endpoint_port>‘,
AVAILABILITY_MODE = ASYNCHRONOUS_COMMIT,
FAILOVER_MODE = MANUAL,
SEEDING_MODE = MANUAL
),
‘<AG2>‘ WITH
(
LISTENER_URL = ‘tcp://<aws_ag_listener_url>:<endpoint_port>‘,
AVAILABILITY_MODE = ASYNCHRONOUS_COMMIT,
FAILOVER_MODE = MANUAL,
SEEDING_MODE = MANUAL
);
GO

Join AG2 to the distributed availability group

On FP, run the following to join AG2 to the distributed availability group:

ALTER AVAILABILITY GROUP [<distributed_ag>]
JOIN
AVAILABILITY GROUP ON
‘<AG1>‘ WITH
(
LISTENER_URL = ‘tcp://<on_prem_listener_url>:<endpoint_port>‘,
AVAILABILITY_MODE = ASYNCHRONOUS_COMMIT,
FAILOVER_MODE = MANUAL,
SEEDING_MODE = MANUAL
),
‘<AG2>‘ WITH
(
LISTENER_URL = ‘tcp://<aws_ag_listener_url>:<endpoint_port>‘,
AVAILABILITY_MODE = ASYNCHRONOUS_COMMIT,
FAILOVER_MODE = MANUAL,
SEEDING_MODE = MANUAL
);
GO

Join the database on FP (RDS Custom)

Manually prepare the database on FP (RDS Custom) with no recovery. You need to manually join the database to the availability group. Note that the database <database_name> is in AG1 before joining the database on FP (RDS Custom).

On FP (RDS Custom), run the following TSQL:

ALTER DATABASE [<database_name>] SET HADR AVAILABILITY GROUP = [<AG2>]

Migrate the on-premises database to Amazon RDS Custom for SQL Server through the distributed availability group

After you create the distributed availability group, you now have AG1 (on premises) replicating data to AG2 (AWS).

In a typical migration scenario, you set this up and let it run until your actual cutover day. When you’re ready for the migration, complete the following steps:

Run the following TSQL on GP (Node1) to understand the lag of FP (RDS Custom) to GP (Node1):

SELECT ag.name
, drs.database_id
, db_name(drs.database_id) as database_name
, drs.group_id
, drs.replica_id
, drs.synchronization_state_desc
, drs.last_hardened_lsn
FROM sys.dm_hadr_database_replica_states drs
INNER JOIN sys.availability_groups ag on drs.group_id = ag.group_id;

Stop all incoming application traffic to GP (Node1) so no write activities are occurring.
On both GP (Node1) and FP (RDS Custom), run the following code to set the distributed availability group to SYNCHRONOUS_COMMIT:

ALTER AVAILABILITY GROUP [<distributed_ag>]
MODIFY
AVAILABILITY GROUP ON
‘<AG1>‘ WITH
(
AVAILABILITY_MODE = SYNCHRONOUS_COMMIT
),
‘<AG2>‘ WITH
(
AVAILABILITY_MODE = SYNCHRONOUS_COMMIT
);

Wait until the status of the distributed availability group has changed to SYNCHRONIZED and all replicas have the same last_hardened_lsn (per database).
Rerun the TSQL on GP (Node1) and FP (RDS Custom) from Step 1 to ensure synchronization_state_desc and last_hardened_lsn are up to date.
On GP (Node1), set the role to SECONDARY:

ALTER AVAILABILITY GROUP [<distributed_ag>] SET (ROLE = SECONDARY);

At this point, the distributed availability group is not available.

On FP (RDS Custom), run the following to initiate the failover:

ALTER AVAILABILITY GROUP [<distributed_ag>] FORCE_FAILOVER_ALLOW_DATA_LOSS;

GP (Node1) and FP (RDS Custom) have swapped roles now. GP has become FP, and FP is the new GP. It is recommended to change the availability mode of AG1 and AG2 back to ASYNCHRONOUS_COMMIT to avoid latency issues.

Post-migration

In case of scale storage or degraded hardware, AWS will replace the old host with a new host and our automation will configure the SQL Server service account back with the NT Services account. If this happens, SMK can’t be automatically decrypted because it was encrypted by a different user (domain user). We recommend reverting the SQL Server service account to the NT Services account so that nothing will break after a degraded hardware host replacement or scale storage.

Rollback plan

To roll back the database and bring the on-premises cluster as primary, complete the following steps:

Run the following on the new GP (old FP) to set the availability mode to SYNCHRONOUS_COMMIT:

ALTER AVAILABILITY GROUP [<distributed_ag>]
MODIFY
AVAILABILITY GROUP ON
‘<AG1>‘ WITH
(
AVAILABILITY_MODE = SYNCHRONOUS_COMMIT
),
‘<AG2>‘ WITH
(
AVAILABILITY_MODE = SYNCHRONOUS_COMMIT
);

On GP (old FP), set the role to SECONDARY:

ALTER AVAILABILITY GROUP [<distributed_ag>] SET (ROLE = SECONDARY);

On FP (RDS Custom, old GP), run the following to initiate the failover:

ALTER AVAILABILITY GROUP [<distributed_ag>] FORCE_FAILOVER_ALLOW_DATA_LOSS;

Remove the on-premises dependencies

You can monitor your application and database performance for a period of time to ensure stable performance. When you’re ready to run independently on AG2 (the RDS Custom for SQL Server cluster), remove the on-premises dependencies.

On FP (RDS Custom, old GP), run the following command to stop data synchronization:

ALTER DATABASE [<database_name>] SET HADR OFF

At this point, AG2’s databases are online and AG1’s databases are in restoring state.

Drop the distributed availability group. It’s recommended to run the following code on GP (old FP) first, followed by FP (old GP):

DROP AVAILABILITY GROUP [<distributed_ag>]

Decommission the on-premises cluster

Now that you have removed the on-premises dependencies from AG2, you can decommission the on-premises database, instance and server accordingly.

Clean up

To avoid incurring future charges, delete the resources you created as part of this post:

To delete the EC2 instance, refer to Terminate your instance.
To delete the DB instance, refer to Deleting a DB instance.

Summary

SQL Server Always On availability groups provide a cost-effective high availability and disaster recovery feature. A distributed availability group is a special type of availability group that enables you to migrate cluster databases spanning across two or more separate availability groups.

In this post, we combined the simplicity of Always On availability groups and the flexibility of distributed availability groups and demonstrated a solution for migrating your on-premises SQL Server cluster to an RDS Custom for SQL Server cluster.

Try out the solution and if you have any comments or questions, leave them in the comments section. Stay tuned for further posts on more SQL Server migration solutions.

About the Authors

Aravind Hariharaputran is Database Consultant with the Professional Services team at Amazon Web Services. He is passionate about databases in general with Microsoft SQL Server as his specialty. He helps build technical solutions that assist customers to migrate and optimize their on-premises database workload to the AWS Cloud. He enjoys spending time with family and playing cricket.

Jeril Jose is Database Specialist Consultant with over 14 years of experience in Microsoft SQL Server and other database technologies. He helps customers architect, migrate and optimize their database solutions to AWS. Prior to joining AWS, he supported production and mission-critical database implementation across financial and retail segments.

Brett Haralson is a Database Consultant with Professional Services team at Amazon Web Services with a specialization in SQL Server. He helps customers to build high-available, cost-effective database solutions and migrate their large-scale SQL Server databases to AWS.