Amazon DocumentDB (with MongoDB compatibility) is a fast, scalable, highly available, and fully managed document database service that supports MongoDB workloads. The Amazon DocumentDB Migration Guide outlines three primary approaches for migrating from MongoDB to Amazon DocumentDB: offline, online, and hybrid. Although the migration guide refers to MongoDB, you can use the offline migration approach for Azure Cosmos DB as well. You can only use the offline approach for migrating Cosmos DB to Amazon DocumentDB, not the online and hybrid migration approaches.
In this post, I explain how you can use the Azure Cosmos DB to Amazon DocumentDB migration utility tool to migrate Azure Cosmos DB API for MongoDB (with v3.6 wire protocol) to Amazon DocumentDB using online migration approach.
The Azure Cosmos DB to Amazon DocumentDB migration utility tool is an application created to migrate a Cosmos DB database to Amazon DocumentDB with minimal downtime. The tool keeps the target Amazon DocumentDB cluster in sync with the source Cosmos DB until the client applications are cut over to the Amazon DocumentDB cluster. It uses the change feed in Azure Cosmos DB to record the changes and replay them on the Amazon DocumentDB cluster.
To accomplish this goal, I use the following services:
Amazon DocumentDB – Stores the data migrated from Cosmos DB
Amazon DynamoDB – Stores metadata and tracking information of change feed data
AWS Lambda – Captures and saves the change feed data from Cosmos DB
AWS Secrets Manager – Stores the database credentials for use by a Lambda function
Amazon Simple Queue Service (Amazon SQS) – Sends input to the Lambda functions
Amazon Simple Storage Service (Amazon S3) – Stores the change feed data in JSON format
A high-level overview of the migration process is as follows:
Prepare the environment for migration:
Create an Amazon Elastic Compute Cloud (Amazon EC2) instance.
Install the required packages using yum.
Download the source code and binaries, and install the dependencies.
Create an S3 bucket and copy Lambda files using the AWS Command Line Interface (AWS CLI).
Create core resources using an AWS CloudFormation
Create Amazon DocumentDB resources using the CloudFormation template.
Save the Amazon DocumentDB connection string in Secrets Manager.
The migration process:
From the provided source code, run the migrator-app application to capture the change feed data.
Create a backup of the Cosmos DB cluster using mongodump.
Create indexes on the target cluster using the Amazon DocumentDB Index Tool.
Restore the backup on the target cluster using the mongorestore
Configure the application settings to apply the change feeds on the target Amazon DocumentDB cluster.
Validate the target cluster is in sync with the source cluster.
The cutover process:
Stop the application from writing to the source Cosmos DB cluster.
Stop the migrator-app application that records the change feed data.
Restart the client applications with the connection string pointing to the Amazon DocumentDB cluster.
The following diagram illustrates the high-level architecture.
For this post, I provide CloudFormation templates to simplify the deployment of the required resources. The prerequisites for the CloudFormation templates are as follows:
An AWS environment with Amazon Virtual Private Cloud (Amazon VPC) containing three private subnets
Python modules from Python Package Index either preloaded or downloaded from a public repository (internet access required)
Additionally, the Cosmos DB cluster incurs higher activity than normal during the migration. Review the Request Units capacity needs for your Cosmos DB cluster.
Prepare the environment for migration
The migration-app application tool supports the migration of data from Cosmos DB’s API for MongoDB (v3.6 version). If the source cluster uses wire protocol support v3.2, upgrade the source deployment and MongoDB application drivers to the v3.6 version or above.
Step 1a: Create an EC2 instance
From the AWS Management Console, create an EC2 instance in a private subnet of a VPC with settings as shown in the following screenshot. I attached the security groups to allow inbound SSH traffic on port 22 and inbound Amazon DocumentDB traffic on port 27017 from this instance. For more information on how to create the security groups, refer to Work with security groups.
I’m using m5ad.xlarge instance type, with vCPU: 4 and RAM: 16 GB. If your source cluster has multiple collections with millions of documents, consider creating an EC2 instance with higher vCPU and RAM to take advantage of parallel processing.
Step 1b: Install the required packages using yum
Connect to the EC2 instance you just created using SSH and install the required yum packages using the following bash script. For more information on how to connect to an EC2 instance in private subnet using a bastion, refer to Securely Connect to Linux Instances Running in a Private Amazon VPC.
Step 1c: Download the source code and binaries, and install the dependencies
Use the following bash script to download the cosmodb-migrator tool binaries and install the Python module dependencies:
Step 1d: Create an S3 bucket and copy the Lambda files using the AWS CLI
The cloudformation/core-resources.yaml CloudFormation template requires that the Lambda functions and Lambda layers are uploaded to an S3 bucket. If you already have an S3 bucket and want to use it, upload the lambda/*.zip to the /lambda/ path on your S3 bucket as shown in the following screenshot. Otherwise, create a new S3 bucket with a globally unique name and upload the files to the /lambda/ path.
Step 1e: Create core resources using a CloudFormation template
The cloudformation/core-resources.yaml CloudFormation template is a shared resource stack that you can reuse across multiple migrations from Cosmos DB to Amazon DocumentDB clusters. When this template runs successfully, all the required resources for the migration, such as the S3 bucket, Amazon SQS queues, Lambda functions, and DynamoDB tables are created and configured automatically.
Create a new stack using the cloudformation/core-resources.yaml template as shown in the following screenshot.
On the next screen, you specify the stack details as shown in the following screenshot.
Choose the VPC network and private subnets appropriate to your environment.
Specify the Amazon S3 bucket name that you used in Step 1d.
As a best practice, I recommend naming your Amazon DocumentDB cluster with same name as the source Cosmos DB cluster. It helps easily identify the mapping between the source and target during the migration process.
Review the core resources stack, then choose Deploy.
Confirm the CloudFormation deployment shows a status of CREATE_COMPLETE before continuing.
Step 1f: Create Amazon DocumentDB resources using a CloudFormation template
The cloudformation/documentdb.yaml CloudFormation template helps you create an Amazon DocumentDB cluster with a three compute instances. Use this template file to create a new Amazon DocumentDB cluster for every Cosmos DB cluster being migrated.
Create a new stack using the cloudformation/documentdb.yaml template as shown in the following screenshot.
On the next screen, you specify the stack details.
Enter a unique stack name for the migration.
Choose the VPC network, private subnets, security group for Amazon DocumentDB.
For your instance type, recommend you to use an Amazon DocumentDB instance type that is close to your Cosmos DB cluster size.
Enter an administrator password with values appropriate for your environment.
Confirm the CloudFormation deployment shows a status of CREATE_COMPLETE before continuing.
Step 1g: Save your Amazon DocumentDB connection string in Secrets Manager
To save your Amazon DocumentDB connection string, complete the following steps:
On the Amazon DocumentDB console, navigate to your cluster.
On the Connectivity & Security tab, choose Connect.
Choose Copy next to Connect to this cluster with an application.
Copy and paste the text in your preferred text editor and replace <insertYourPassword> with the password used in the previous step.
Save the connection string information in Secrets Manager as shown in the following code.
Update the values for AWS_DEFAULT_REGION, AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, <your-cluster-name>, and <your-documentdb-connection-string> with appropriate values for your environment. The Lambda function batch-request-reader uses this connection string to apply the change feed data on the target Amazon DocumentDB cluster.
The migration process
Step 2a: Start the migrator-app application
The next step in the live migration process is to use the migration application to capture the change feed data from Cosmos DB cluster. The application saves the data into the Amazon S3, then stores the metadata and tracking information in DynamoDB tables. Start the migration application using the following commands. Update the values for <your-cosmosdb-connection-string> and <your-cluster-name> with values appropriate for your cluster.
Keep the migrator-app application running until the cutover period. For large database migrations, I strongly recommend you to run the following commands in a screen session or run with the nohup command to ensure that the migrator-app doesn’t get stopped when you log out.
After running the preceding command, you should observe an output similar to the following:
Step 2b: Create a backup of the Cosmos DB cluster using the mongodump tool
In a new terminal session, export the data and indexes from the source Cosmos DB cluster using the mongodump tool (see the following code). The time it takes to perform the dump and the size of the dump depends on the data size of the source Cosmos DB cluster. Make sure that the disk device where you’re exporting the data has enough free disk space to hold the mongodump output. Other factors that may impact the overall execution time include the speed of the network between the EC2 instance and the source cluster, and the CPU/RAM resources of the EC2 instance. Update the values for <your-cosmosdb-connection-string>, <your-cosmos-db-server>, <port-number>, <your-username>, and <your-password> with values appropriate for your cluster.
To minimize the impact of the migration to any workload on the source cluster’s primary, export the data using a secondary read preference. If your source cluster doesn’t have a secondary, exclude the –readPreference command line argument. If you have multiple collections to export, use the argument –numParallelCollections <number-of-cpu-cores> to dump multiple collections in parallel.
Step 2c: Create indexes on the target cluster using the Amazon DocumentDB Index Tool
Use the Amazon DocumentDB Index Tool to create the required indexes on the Amazon DocumentDB cluster.
Step 2d: Restore the backup on the target cluster using the mongorestore tool
Restore the mongodump data from Step 2b with the following code. If you have multiple collections to import, use the argument –numParallelCollections <number-of-cpu-cores> to restore multiple collections in parallel. Increasing the value of the –-numInsertionWorkersPerCollection argument to the number of vCPU cores on the Amazon DocumentDB cluster’s primary instance may increase the speed of the import. Update the values for <your-documentdb-server>, <number-of-vcpus>, <your-username>, and <your-password> with values appropriate for your cluster.
Step 2e: Configure the event writer to apply the change feed data
The mongodump and mongorestore processes take time depending on the Cosmos DB and Amazon DocumentDB cluster configuration, and the size of the data and indexes being exported or imported. When the mongorestore step is complete, you should configure the migration application to start applying the change feed data on the target Amazon DocumentDB cluster.
The following commands help you configure the event writer to start processing the change feed data. Update the values for <your-cluster-name> with values appropriate for your cluster.
You should observe an output similar to the following:
Step 2f: Validate the target cluster is in sync with the source cluster
The Lambda functions from the CloudFormation stack start applying the change feeds on the target Amazon DocumentDB cluster in the order in which they happened on the source. You can observe the status of the migration application using the following command to see how far the target cluster is behind the source cluster:
You should observe an output similar to the following:
The gap_in_seconds value represents the time gap between the source and target cluster operations. The time_gap_in_seconds value represents the time gap between the source and the target at the collection level. When the gap_in_seconds value is under 10 seconds, you can continue to the next step.
The cutover process
This process involves updating your source application to connect to the target Amazon DocumentDB cluster. Because the migration application has multiple components, the process is as follows:
Stop the applications connected to Cosmos DB or place them in read-only mode.
Wait for the configure –watch-status application to report the gap_in_seconds value is equal to 0 seconds.
Stop the migrator-app and configure –watch-status applications by stopping the Python processes (press Ctrl+C).
Stop the event writer by running the following commands:
Restart the client applications with the connection string pointing to the Amazon DocumentDB cluster endpoint.
After you perform the cutover steps successfully, your database is fully migrated to the Amazon DocumentDB cluster with minimal downtime.
Errors while running migrator-app
If the configure application with –watch-status isn’t making any progress, try stopping and restarting the application using the following commands:
If that still doesn’t fix the issue, search for error text on the log group details page on the Amazon CloudWatch console to identify what’s causing the issue.
CloudFormation template stack doesn’t make progress
If the core-resources.yaml CloudFormation template doesn’t make progress while creating or deleting the resource, AWS CloudFormation may be facing some issues while creating or deleting the EventSourceMapping resources. Sign in to the AWS CloudTrail console and examine the event history for any FAILED events, such as the following:
Capture the UUID from the log output and manually delete the resource using the AWS CLI:
For more information on identifying which resource is blocking progress, see Why is my AWS CloudFormation stack stuck in the state CREATE_IN_PROGRESS, UPDATE_IN_PROGRESS, UPDATE_ROLLBACK_IN_PROGRESS, or DELETE_IN_PROGRESS?
In this post, I showed how you can perform a live migration of the Azure Cosmos DB API for MongoDB database to Amazon DocumentDB with minimal downtime. The Azure Cosmos DB to Amazon DocumentDB migration utility tool keeps the target Amazon DocumentDB cluster in sync with the changes on the source Cosmos DB cluster, and helps minimize the overall application downtime as you perform the migration. The source code referred to in this post is available in the GitHub repo.
If you have any questions or comments about post, please share them in the comments. If you have any feature requests for Amazon DocumentDB, email us at [email protected].
About the author
Shyam Arjarapu is a Sr. Data Architect at Amazon Web Services and leads Worldwide Professional Services for Amazon DocumentDB. Shyam is passionate about solving customer problems and channels his energy into helping AWS professional service clients build highly scalable applications using Amazon DocumentDB. Before joining AWS, Shyam held a similar role at MongoDB and worked as Enterprise Solution Architect at JP Morgan & Chase, SunGard Financial Systems, and John Hancock Financial.
Ravi Tallury is a Principal Solution Architect at Amazon Web Services (AWS) and has over 25 years of experience in architecting and delivering IT solutions. Prior to joining AWS, he led solution architecture, enterprise architecture for automotive/life sciences verticals.
Read MoreAWS Database Blog