Customers running their database workloads using Amazon DocumentDB (with MongoDB compatibility) might need to replicate data from one AWS account to another. By doing so, you can achieve faster development or QA environment refreshes, which provides a production-like environment to troubleshoot current production issues, and share data with partners.
Amazon DocumentDB provides support for copying snapshots between different AWS accounts. You can use this feature to copy data to other accounts, and the time taken to copy data varies depending on the size of the snapshot. Larger snapshots take more time because each copy transfers the full snapshot. An alternate approach can be to use AWS Database Migration Service (AWS DMS) or Amazon DocumentDB change streams to replicate data across accounts.
AWS DMS makes it easy to migrate relational and NoSQL databases to AWS. With AWS DMS, you can perform one-time migrations, and you can replicate ongoing changes to keep sources and targets in sync. AWS DMS supports Amazon DocumentDB as source and target endpoints to migrate data to and from Amazon DocumentDB. For more information, see Using Amazon DocumentDB as a target for AWS Database Migration Service.
In this post, I use Amazon DocumentDB as source and target database clusters, AWS DMS for data replication, and VPC peering with Amazon Virtual Private Cloud (Amazon VPC) to connect two AWS accounts. Later, I use a sample dataset to verify real-time data replication between the source and target.
The solution in this post addresses the following main concerns:
Real-time data replication between AWS accounts and AWS Regions
Creating a global solution for data sharing with partners, environment refreshes, and quickly provisioning a production-like staging area for troubleshooting
The following architecture diagram illustrates my solution.
The implementation of this solution consists of the following tasks using various AWS services:
Create a cross-Region VPC peering connection using two different AWS accounts.
Create an AWS DMS task to start a full load and change data capture (CDC) replication from the source Amazon DocumentDB database in one Region to the target Amazon DocumentDB database in a different Region.
Run the replication task.
Verify the changes are replicated from the source to the target.
This post assumes the following AWS resources have already been provisioned:
Two AWS accounts: the source Region (Account A) and target Region (Account B) with Amazon DocumentDB clusters. For more information, see Amazon DocumentDB Quick Start Using AWS CloudFormation.
An AWS DMS replication instance in Account A. For more information, see Choosing the right AWS DMS replication instance for your migration.
An AWS DMS source endpoint in Account A and target endpoint in Account B. For the source endpoint, make sure you use the metadata mode document.
A Linux-based workstation with a mongo shell in both accounts.
Setup VPC peering
A VPC peering connection is a networking connection between two VPCs that enables you to route traffic between them using private IPv4/IPv6 addresses. Instances in either VPC can communicate with each other as if they are within the same network. A VPC peering connection helps you facilitate the transfer of data between VPC resources, including Amazon Elastic Compute Cloud (Amazon EC2) instances and Amazon DocumentDB databases that run in different Regions. It’s important to note that you can’t create a VPC peering connection between VPCs that have matching or overlapping IPv4 or IPv6 CIDR blocks.
In this step, I create a VPC peering connection between two VPCs in different Regions using two different accounts. We start by accessing the AWS Management Console in Account A.
From the Amazon VPC console, choose Peering Connections.
Choose Create Peering Connection.
Name the peering connection docdb-vpc-peering.
For Select a local VPC to peer with, choose VPC (Requester)*.
For Account, choose Another account and enter the account ID (target Account B).
For Region, choose Another Region.
Choose VPC ID (Accepter)*.
Leave the Tags section at the default settings.
Choose Create peering connection.
On the AWS Identity and Access Management (IAM) console, under Virtual Private Cloud VPC), choose Peering Connections.
Choose the new peering connection, which is in Pending Acceptance state.
On the Actions menu, choose Accept Request.
Choose Yes, Accept.
Choose Close to complete the setup.
You can verify the status of this VPC peering connection is active in both accounts.
You must verify that you have Amazon DocumentDB clusters set up in the source and target accounts in order to complete the next steps. If they’re not set up, see Creating an Amazon DocumentDB Cluster.
Create a database migration task
An AWS DMS task is where all the work happens. Here you specify what collections and schemas to use for your migration. When creating a migration task, make sure that you create a source endpoint, a target endpoint, and a replication instance as mentioned in the prerequisites section.
Before you start the replication, make sure to enable the change streams on the source database in order for the migration task to run in CDC mode. Run the following code from the mongo shell on the source Amazon DocumentDB cluster:
In this step, I create a migration task that starts the initial migration between the source and target data stores and captures ongoing changes to the source data store after completing the initial migration to the target data store. Let’s start by accessing the console in Account A.
On the AWS DMS console, choose Replication instances and verify the replication instance.
Choose Endpoints and verify the source and target endpoints.
In the Task configuration section, choose Database migration.
Choose Create task.
For Task identifier, enter a name.
For Descriptive Amazon Resource Name (ARN), provide a name.
For Replication instance, choose an existing replication instance.
For Source database endpoint, choose an existing endpoint.
For Target database endpoint, choose an existing endpoint.
For Migration type, choose Migrate existing data and replicate ongoing changes.
In the Task settings section, for Editing mode, choose Wizard.
For Target table preparation mode, choose Drop tables on target.
For Stop task after full load completes, choose Don’t stop.
Choose Include LOB columns in replication and choose Limited LOB Mode.
For Maximum LOB size (KB), enter 32.
Leave Enable validation
Select Enable CloudWatch logs and leave the associated fields at their default.
Under Advanced task settings, choose Create control table in target using schema.
Use the schema migtest.
For History timeslot, enter 5.
Enable Apply exceptions/Replication status/Suspend tables/Replication history.
In the Table mappings section, for Editing mode, choose Wizard.
For Selection rules, choose Add new selection rule.
For Schema, enter migtest.
For Table name, enter %.
For Action, choose Include.
Leave Premigration assessment
In the Migration task startup configuration section, for Start migration task, chose Manually later.
Optionally, apply tags in the Tags
Choose Create task.
At this stage, the migration task is being created.
Run the migration task
When the task is in Ready status, start the AWS DMS migration task.
On the AWS DMS console, select your migration task.
On the Actions menu, choose Restart/Resume.
Choose the Table statistics tab to see the current activity that’s happening under this task.
Test the solution
At this stage, you’re ready to enter sample data in the source Amazon DocumentDB database (migtest). Following screen shot is showing the source DocumentDB before a sample data insert.
Similarly, in the mongo shell, connect to the target Amazon DocumentDB database (target-documentdb) and verify the latest data.
After you add the sample dataset on the source database under Account A, the AWS DMS replication task populates those changes to the target database under Account B, which is also in a different Region.
The following screenshot shows our source database insert statement –
The following screenshot shows the data replicated in our target database.
You can also verify the changes are being captured and replicated from the source to target database on the AWS DMS console, on the Quick view and compare page.
The following screenshot shows the Table statistics tab.
You can also track Amazon CloudWatch metrics of this migration task on the CloudWatch metrics tab.
On the Overview details tab, choose View CloudWatch logs to see detailed information of the changes being replicated.
Besides using the CloudWatch to monitor the data replication task, it’s always a good idea to keep an eye on DMS replication instance activity, use Amazon SNS (simple Notification Service) for DMS events notification when your workload is running. If you plan to use ongoing replication, we recommend using a Multi-AZ option for your DMS replication instance to provide high availability and failover support for the replication instance. To use AWS Database Migration Service most effectively, see DMS best practices on the most efficient way to migrate your data.
This post demonstrates how to replicate data across AWS accounts using Amazon DocumentDB clusters, AWS DMS, and VPC peering. You can also use this solution for data sharing and for database environment refreshes across accounts. You can further enhance this solution with AWS CloudFormation to automate spinning up the DMS resources and creating the migration task.
If you have any questions or comments about this post, please use the comments section. If you have any features requests for Amazon DocumentDB, email us at mailto:[email protected]
About the author
Pragnesh Patel is a Database Migration Consultant at AWS. He works with customers in their journey to the cloud with a focus on database migrations. In his spare time, Pragnesh enjoys traveling to new places with his wife and kids.
Read MoreAWS Database Blog