Thursday, April 18, 2024
No menu items!
HomeDatabase ManagementBuild a solution for data migration between on-premises and Amazon Aurora databases...

Build a solution for data migration between on-premises and Amazon Aurora databases hosted in isolated VPCs using AWS DMS

Many customers migrating their on-premises database workloads to the AWS Cloud select AWS Database Migration Service (AWS DMS) as their tool of choice to quickly migrate databases. AWS DMS is a service that supports homogeneous and heterogeneous migrations between different database platforms. Although setup and configuration of AWS DMS is very straightforward, selecting the right network topology to enable communications between source and target databases might require additional analysis. You must select the right balance between performance, reliability, costs, and access requirements. If you have limitations on the number of available Classless Inter-Domain Routing (CIDR) ranges in your internal network, AWS services allow listed, or direct access restrictions between on-premises and AWS databases imposed by internal security policies, you must implement more creative solutions to migrate databases to AWS.

In this post, we present a solution that enables the migration of databases hosted on premises to isolated VPCs using AWS PrivateLink. This pattern allows you to establish network connectivity between the AWS DMS replication instance and databases hosted on VPCs with overlapping CIDR ranges, and eliminates the need to use AWS Transit Gateway or VPC peering via Amazon Virtual Private Cloud (Amazon VPC). Throughout this post, we discuss the proposed pattern in detail and provide Terraform scripts to help you build a proof of concept (POC) environment that can be tailored to match your specific requirements.

Solution overview

This solution demonstrates how to configure different AWS services to simulate customer environments consisting of AWS DMS instances hosted on VPCs without direct network connectivity to on-premises and target networks. The objective is not to provide a deep dive on the services used, but to act as a guideline on how to achieve connectivity for these services without the need for a direct peering relationship with the VPC hosting AWS DMS. Finally, to validate the architecture, we provide SQL statements to create sample database objects and insert data, and AWS Command Line Interface (AWS CLI) commands to migrate objects and data using AWS DMS. We use the following AWS services and resources in our proposed architecture:

Three VPCs (on premises, AWS DMS, and target)
Two Network Load Balancers (on premises to AWS DMS and AWS DMS to target)
Two AWS PrivateLink endpoints (on-premises to AWS DMS and AWS DMS to target)
Two Amazon Aurora PostgreSQL-Compatible Edition databases (on premises and target)
Two AWS Secrets Manager database secrets
One AWS DMS instance
One Elastic Cloud Compute (Amazon EC2) instance (migration workstation)

The solution is represented in the following diagram.

Prerequisites

Implementation of this solution uses Terraform; therefore, we assume the following prerequisites have been satisfied prior to running the scripts:

Access to an AWS account
Terraform has been configured with AWS
Access to the GitHub repository containing the Terraform scripts

If all the prerequisites are in place, you can proceed to run the Terraform scripts to create the environment on your AWS account.

Provision the infrastructure

We have created Terraform scripts hosted in a Git repository to deploy the infrastructure. Complete the following steps:

Using git, clone the repository to your local machine:

git clone https://github.com/aws-samples/aws-dms-terraform.git

Navigate to the provisioning directory:

cd provisioning

Initialize the working directory that contains the Terraform configuration files:

terraform init

Create a run plan:

terraform plan -out=tfplan -var-file=”../variables/dev/common.tfvars.json”

Run the actions proposed in the Terraform plan to create the infrastructure:

terraform apply “tfplan”

Deployment takes around 15–20 minutes to complete.

Creation of VPCs

The first stage of the Terraform deployment creates three VPCs. Each VPC uses similar CIDR ranges and there is no connectivity among them. The first VPC simulates the on-premises network and hosts the source Aurora PostgreSQL database. The second VPC hosts the AWS DMS and EC2 instances used for database migration purposes. The third VPC hosts the target Aurora PostgreSQL database. For a complete list of features and the steps necessary to create a VPC in your AWS account, refer to Get started with Amazon VPC.

Creation of Aurora PostgreSQL clusters

The Terraform scripts create Aurora PostgreSQL clusters to simulate the source and target database environments. Each Aurora cluster is provisioned on a separate VPC without direct connectivity among them.

Creation of Secrets Manager secrets

Two Secrets Manager secrets are provisioned to provide secure storage of source and target database credentials during the migration. For additional information on how to create secrets, refer to Create and manage secrets with AWS Secrets Manager.

Creation of Network Load Balancers

Two Network Load Balancers are provisioned to forward connections generated on the AWS DMS instance to the appropriate source or target database.

Creation of VPC endpoints

Two VPC endpoints are provisioned to point to the source and target Network Load Balancers.

Creation of VPC endpoint services

Two VPC endpoint services are provisioned, one in the source VPC and one in the target VPC. Each one is associated with their respective Network Load Balancer.

Creation of an AWS DMS instance

An AWS DMS instance is provisioned to automate the migration of the on-premises database to the target AWS database. After the instance has been created, an AWS Identity and Access Management (IAM) role is created to enable AWS DMS to access the respective source and target secrets. Next, two AWS DMS endpoints are provisioned for the source and target databases using the VPC endpoints previously created for the respective Network Load Balancers.

Creation of EC2 instance

An EC2 instance is provisioned by the Terraform deployment to act as a bastion host from which SQL statements are run via the PostgreSQL client, and AWS CLI commands to interact with AWS DMS are run. AWS Systems Manager Session Manager is enabled to securely connect to the EC2 instance. The EC2 instance is provisioned with the required PostgreSQL and AWS CLI software to simplify the migration activities.

Validation of Terraform build

After the environment has been deployed, we can validate that all the AWS services have been properly provisioned by looking at infrastructure deployed by running the following command:

terraform output

You can expect to see an output similar to the following screenshot.

Validate the solution

To validate the environment we just built using the Terraform modules, we perform a test migration using AWS DMS. The workflow involves the creation of a set of tables on the source database, insertion of data records on the new tables, and finally the creation and run of the AWS DMS migration task. All the steps listed in the following sections are run from the EC2 bastion host.

Prepare the source database

To prepare the source database, complete the following steps:

In the output of the terraform output command, take note of the following four fields (you need them to connect to each database):

secret_source_db = password of the source database
secret_target_db = password of the target database
source_vpc_endpoint = endpoint to connect to the source database
target_vpc_endpoint = endpoint to connect to the target database

Use the PostgreSQL client (psql) in the bastion host to connect to the source database via the Network Load Balancer VPC endpoint:

psql -h source_vpc_endpoint -p 5432 -d postgres -U postgres

Create a database, schema, and table objects:

CREATE SCHEMA demo_db;
c demo_db
CREATE SCHEMA demo_schema;
CREATE TABLE demo_schema.demo_accounts(
accountNumber BIGINT NOT NULL,
firstName VARCHAR(20),
lastName VARCHAR(20),
creationTime TIME NOT NULL
);

Describe the table:

dt+ demo_schema.demo_accounts

The output must look similar to the following screenshot.

Insert sample records in the newly created table:

INSERT INTO demo_schema.demo_accounts VALUES
(1,’John’,’Smith’,current_timestamp),
(2,’Mary’,’Doe’,current_timestamp),
(3,’Tom’,’Good’,current_timestamp)
(4,’Peter’,’Flynn’,current_timestamp);

The insert command returns an output similar to the following code:

INSERT 0 4

Verify all the records were successfully created:

SELECT * FROM demo_schema.demo_accounts;

The output must look similar to the following screenshot.

Connect to the target database

Before we start the migration, verify that you can connect to the target database.

Use the PostgreSQL client installed in the bastion host to connect to the target postgres database via the target VPC endpoint:

psql -h target_vpc_endpoint -p 5432 -d postgres -U postgres

Create the target database by running the following statement:

CREATE DATABASE demo_db;

Create the AWS DMS replication task

The AWS DMS replication tasks are the core component of the database migration process. They contain all the metadata require to instruct AWS DMS on how to perform the migrations. Therefore, before we create the replication task, let’s create two JSON configuration files to use for the task settings and table mappings.

Create the task-settings.json file using the following sample JSON document:

{
“FullLoadSettings”:{
“TargetTablePrepMode”:”TRUNCATE_BEFORE_LOAD”
},
“Logging”:{
“EnableLogging”:true
}
}

The TargetTablePreMode setting indicates how to handle loading the target at full-load startup. The available options are as follows:

DO_NOTHING – Data and metadata of the existing target table aren’t affected
DROP_AND_CREATE – The existing table is dropped and a new table is created in its place
TRUNCATE_BEFORE_LOAD – Data is truncated without affecting the table metadata

For this example, we chose the TRUNCATE_BEFORE_LOAD option given the simplicity of the test environment, but you can select the option that better fits the needs of your organization.

Create the table-mappings.json file using the following JSON document:

{
“rules”:[
{
“rule-type”:”selection”,
“rule-id”:”1″,
“rule-name”:”1″,
“object-locator”:{
“schema-name”:”demo_schema”,
“table-name”:”demo_accounts”
},
“rule-action”:”include”,
“filters”:[]
}
]
}

We configure a few environment variables to store information such as Amazon Resource Names and the AWS Region in which the resources are deployed:

export AWS_REGION=us-east-1

export sourceEndpointARN=$(aws dms describe-endpoints –region $AWS_REGION –filters Name=endpoint-type,Values=SOURCE –query ‘Endpoints[0].EndpointArn’ –output text)

export targetEndpointARN=$(aws dms describe-endpoints –region $AWS_REGION –filters Name=endpoint-type,Values=TARGET –query ‘Endpoints[0].EndpointArn’ –output text)

export dmsInstanceARN=$(aws dms describe-replication-instances –region $AWS_REGION –query ‘ReplicationInstances[0].ReplicationInstanceArn’ –output text)

export dmsRepTask=dms-task-demo

Run the following AWS CLI command to create an AWS DMS migration task called dms-task-demo:

aws dms create-replication-task –replication-task-identifier $dmsRepTask –source-endpoint-arn $sourceEndpointARN –target-endpoint-arn $targetEndpointARN –region $AWS_REGION –replication-instance-arn $dmsInstanceARN –migration-type full-load –table-mappings file://table-mappings.json –replication-task-settings file://task-settings.json –region $AWS_REGION

Wait approximately 60 seconds for the task to be created and then run the AWS DMS migration task:

export dmsRepTaskARN=$(aws dms describe-replication-tasks –filters Name=replication-task-id,Values=$dmsRepTask –query “ReplicationTasks[*].ReplicationTaskArn” –region $AWS_REGION –output text)

aws dms start-replication-task –start-replication-task-type start-replication –replication-task-arn $dmsRepTaskARN –region $AWS_REGION

Monitor the migration task while it is running. You can run the following command in a loop at different intervals:

aws dms describe-replication-tasks –filters Name=replication-task-id,Values=$dmsRepTask –region $AWS_REGION –output table

When the task is complete, connect to the target database using a PostgreSQL client:

psql -h target_vpc_endpoint -p 5432 -d postgres -U postgres

Describe the target table to validate it is migrated successfully:

dt+ demo_schema.demo_accounts

The output must look similar to the following screenshot.

Query the contents of the demo_schema.demo_accounts table to list all the records match the records stored in the source database:

SELECT * FROM demo_schema.demo_accounts;

The output must look similar to the following screenshot.

After you verify the contents of the database, delete the replication task by running the following command:

aws dms delete-replication-task –replication-task-arn $dmsRepTaskARN –region $AWS_REGION

For more information about AWS DMS best practices, refer to Best practices for AWS Database Migration Service.

Clean up

When you finish your test, make sure to remove all the created resources so you avoid incurring future costs. Run the following command to destroy all the objects managed by your Terraform configuration:

terraform destroy -var-file=”../variables/dev/common.tfvars.json”

Conclusion

In this post, we presented a solution for implementing communication between VPCs using VPC endpoints and Network Load Balancers for performing database migrations using AWS DMS in a secure and efficient way. We also provided all the required Terraform automation scripts and AWS CLI commands to provision the different AWS services, and SQL statements to create and query database objects. We encourage you to try this solution.

As always, we welcome your feedback, so please leave your comments.

About the Authors

Xavier Navas is a Devops Consultant at Amazon Web Services. He works with customers in their journey to the cloud with a focus on building complex CI/CD pipelines, automation and infrastructure as code. In his spare time, Xavier enjoys going to the gym, boxing and learning new technologies.

Israel Oros is a Database Migration Consultant at AWS. He works with customers in their journey to the cloud with a focus on complex database migration programs. In his spare time, Israel enjoys traveling to new places with his wife and riding his bicycle whenever weather permits.

Read MoreAWS Database Blog

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Most Popular

Recent Comments