Recently one of our customers, asked us to help them encrypt their unencrypted Amazon Relational Database Service (Amazon RDS) for PostgreSQL. In this post, we show a solution to create an encrypted database from their existing unencrypted database and cut over with the least disruption to applications. This solution uses database Snapshot and PostgreSQL logical replication.
Amazon RDS offers two types of snapshots automatic and manual. This solution uses a manual snapshot. PostgreSQL logical replication uses a publish and subscribe model with one or more subscribers, subscribing to one or more publications on a publisher node. Amazon RDS for PostgreSQL and Amazon Aurora PostgreSQL-Compatible Edition support logical replication with decoding plugins such as pglogical and pgoutput. Logical replication is commonly used for data migration from on premises to Amazon RDS or Amazon Aurora PostgreSQL-Compatible Edition.
As an Amazon RDS and Aurora security best practice, you must encrypt your databases and snapshots at rest and in transit. The encryption of database at rest provides an additional layer of protection from unauthorized access to data.
Converting an unencrypted RDS for PostgreSQL or Amazon Aurora PostgreSQL database to encrypted
For an encrypted database instance data on storage, transaction logs, backups, and snapshots are encrypted. There are two ways you can accomplish. One is by using the Amazon RDS console, which is a straightforward approach but requires some down time. If your applications are sensitive to down time duration, then you can follow the second approach and use logical replication to minimize downtime.
You can use the same solution for switching the encryption key from an AWS managed key to a customer managed key for an existing encrypted Amazon RDS for PostgreSQL or Amazon Aurora database.
Overview of solution
This solution carries out the conversion in four high-level steps:
Enable native PostgreSQL replication on the unencrypted source database and take a manual snapshot.
Restore the snapshot to a new database instance with encryption enabled.
Perform logical replication on the restored database for change data capture.
Cut over your application to the encrypted database and retire (or delete) the unencrypted database.
Prerequisites
You can try out the hands-on steps using Amazon RDS for PostgreSQL or Amazon Aurora PostgreSQL-Compatible Edition. For this post, we demonstrate the steps with Amazon RDS for PostgreSQL.
You must complete the following steps before starting:
Have an AWS account with permissions to create resources.
Have an unencrypted RDS for PostgreSQL DB cluster and PostgreSQL client utility to try out the steps discussed in this post. You can use an existing unencrypted database or create a new database. Make sure that Enable Encryption is unchecked (see the following screenshot).
Set up an Amazon Elastic Compute Cloud (Amazon EC2) instance with connectivity to the DB instance. For more information, refer to Creating and connecting to a PostgreSQL DB instance.
Set up publication on the source database
In this step, you set up a publication, which is set of changes generated from a table or group of tables. The source database is referred to as the publisher node.
To enable logical replication in Amazon RDS for PostgreSQL, modify a custom parameter group to set rds.logical_replication to 1 and attach the custom group to the database.
This step requires a reboot of the database instance. If your database instance is already setup with a custom parameter group, modify the aforementioned parameter and reboot the instance.
Verify the setup by running the following SQL statement:
Now you can create the publication. You can select the tables for which you would like to capture the changes on the replica. For this use case, you capture changes for all tables. Because a publication is associated with a single database, you need to create one publication per database. In this post, we are assuming that there is a single database.
Start a client session to the database using any PostgreSQL client application (such as psql) and run the following command:
Next, you create the replication slot, which is a where the publisher nodes retain the WAL logs that are needed by the replicas. The subscriber nodes read the streamed changes from the replication slot to update the data. You run the system function pg_replication_logical_replication_slot() to create the replication slot. For logical decoding, you use the pgoutput decoding plugin that comes prepackaged with PostgreSQL.
Create the replication slot with the following code:
Backup and restore to create an encrypted database
In this step, take a backup and restore it to a new database instance. The new instance will have encryption enabled. Note that there is a minor difference between Amazon RDS for PostgreSQL and Amazon Aurora PostgreSQL-Compatible Edition from a restoration perspective. To create an encrypted Aurora cluster, you can restore from an unencrypted snapshot with encryption enabled. In the case of Amazon RDS, you need to carry out an additional step before restoration. This step involves creation of an encrypted snapshot from the unencrypted snapshot. The difference is depicted in the following figure.
Note that the steps listed here are for Amazon RDS. The following diagram depicts the steps carried out in this phase.
Complete the following steps:
Create an RDS snapshot using the Amazon RDS console or the following code via the AWS Command Line Interface (AWS CLI):
In this next step, you need to provide the Amazon Key Management Service (AWS KMS) key for database encryption. The KMS key you use can be AWS managed key or a customer managed key. In this post, you use the AWS managed default RDS encryption key (alias – aws/rds).
Create an encrypted snapshot using the following code:
Now you can restore the database from the encrypted snapshot. Make sure to enable the PostgreSQL logs; you need the log sequence number (LSN) in a later step. You can carry out this step using the Amazon RDS console or the AWS CLI, as documented in this post.
Restore the database with the following code. Specify the VPC, subnet group and security group for your encrypted database. To ensure that the rds.logical_replication parameter is set up in the new database, use the same custom parameter group as your source database.
Set up replication between unencrypted and encrypted databases
At this stage your database is updated to the latest LSN available in the snapshot. Now you set up the subscription on the restored database instance to receive the database changes. This will replicate the changes that occurred after the snapshot creation and will continue to stream all changes. The following diagram depicts the steps taken in this phase.
Start by verifying the replication setup on the encrypted database:
Create a subscription that will receive changes from the encrypted database:
By default, the subscription will be disabled; you enable it in a later step.
Get the LSN from the PostgreSQL logs
You can get the LSN by opening the PostgreSQL logs in the Amazon RDS console, Amazon CloudWatch console, or AWS CLI. After the instance is restored, navigate to the AWS console and go to the Logs and Events tab. Scroll down to the most recent log file. Open the log file and look for “:LOG: invalid record length”. The following example snippet shows how to find the LSN.
You now use the PostgreSQL system administration function pg_replication_origin_advances to advance the current confirmed position of the replication slot.
Connect to the encrypted database using a PostgreSQL client such as psql and run the following commands:
Enable the replication:
To Verify that replication is working, use the SQL client to ensure that changes are showing up in the encrypted database:
You should see lsn_distance = 0.
Keep in mind that the duration of full synchronization will depend on the amount of data changes made between the creation of the snapshot and enabling of replication.
When your new encrypted database is in sync with old unencrypted database, you’re ready for cutover.
Switch over applications to the encrypted database
At this time, you can repoint your applications to encrypted database using the guidance below.
Disconnect your applications from the database and stop the unencrypted database to prevent accidental writes.
Configure your applications to connect to the encrypted database instance
Restart and verify the applications functionality
Keep the unencrypted instance for a period of time before removing it.
Clean up
To avoid incurring future charges, delete the resources you created.
Remove the publication
Remove the replication slot:
Disable the PostgreSQL logs and log publishing to CloudWatch (optional).
Delete the CloudWatch group:
Delete the unencrypted database if you no longer need it.
Conclusion
In this post, we showed how to convert an unencrypted RDS for PostgreSQL database to an encrypted database with minimum downtime. The process involves using an encrypted snapshot for restoration to an encrypted database instance followed by using native replication for CDC between the unencrypted and encrypted database instances. The benefit of this approach is that the overall time taken for this activity for especially large database is less. After the encrypted database has caught up with the changes on the unencrypted database, you can switch your applications to the encrypted database with minimal downtime.
If you have questions, comments, or feedback, leave them in the comments section.
About the Authors
Santosh Bhupathi is a Senior Database Specialist Solution Architect based in Philadelphia. With a focus on Relational and No-SQL databases, he provides guidance and technical assistance to customers to help them design, deploy, and optimize database workloads on AWS.
Rajeev Sakhuja is a Solution Architect based out of New York City. He enjoys partnering with customers to solve complex business problems using AWS services. In his free time, he likes to hike, and create video courses on application architecture & emerging technologies; check him out on Udemy.
Read MoreAWS Database Blog