Amazon Aurora, a high-performance, fully managed relational database service offered by Amazon Web Services (AWS), provides users with a Blue/Green Deployments feature that enables you to make database updates safer, simpler, and faster. Blue/Green Deployments create a fully managed staging environment using logical replication, that allows you to deploy and test production changes, keeping your current production database safer. The blue environment represents the database currently managing the production workload, while the green environment operates with necessary updates or changes. Blue/Green Deployments minimizes risks and reduce downtime associated with the database updates or changes, such as major or minor engine version upgrades and system updates. You can efficiently switch over your staging environment to become the new production environment without any changes to your application end point.
Despite careful planning and rigorous testing in the green environment, unforeseen issues might occasionally arise after the blue/green deployment switchover. For instance, application compatibility problems with the new production environment could arise, which might have gone unnoticed during the testing phase. Additionally, application performance degradation could occur due to the increased workload or resource demands in the new production environment, which might not have been accurately simulated during testing. In such cases, having a rollback plan becomes crucial.
In this post, we discuss the steps to perform a blue/green deployment switchover and how to set up and perform a rollback strategy post switchover for Amazon Aurora MySQL-Compatible Edition.
Solution overview
Following a successful Aurora MySQL blue/green deployment creation and switchover, the green environment becomes the new production environment. The names and endpoints from the current production environment are assigned to the newly promoted production environment, with no changes needed to your application. After the switchover, the old production environment is no longer synchronized with the newly promoted production environment. The DB cluster and DB instances in the old production environment are renamed by appending -oldn to the current name, where n is a number and this old environment can be used as a rollback strategy. In this approach, we manually setup logical replication from the new production environment to the old environment following the switchover to achieve a rollback plan in case of an unexpected issue.
The following diagram illustrates the rollback strategy after the Aurora MySQL blue/green deployment switchover.
To implement this solution, the high-level steps are:
Prepare the green environment (staging) for rollback.
Perform the Aurora MySQL blue/green deployment switchover.
Delete the Aurora MySQL blue/green deployment. At this stage, deleting a blue/green deployment doesn’t affect any of your database environment.
Configure logical replication from the production environment to the rollback environment.
In case of any issues in the production environment, switch to the rollback environment.
In this post, we use Aurora MySQL-Compatible Edition version 2 (MySQL 5.7) as the blue environment and Aurora MySQL-Compatible Edition version 3 (MySQL 8.0) as the green environment.
Prerequisites
You need the following components to implement this solution:
An Aurora MySQL blue/green deployment. For setup instructions, see Creating a blue/green deployment.
The Aurora MySQL green cluster must be associated with a custom database (DB) cluster parameter group. For more information about Aurora DB cluster parameter groups, see Working with DB cluster parameter groups.
Limitations
These are limitations associated with this solution:
MySQL doesn’t officially support replication from source database instance running higher major version to target database instance running lower major version; therefore, replication might encounter problems or difficulties (for more details, see Replication Compatibility Between MySQL Versions). Because you want to have a rollback strategy for a short period of time while you settle into the new environment, during this time, make sure you don’t use new database features that are only available in the new database version. This can cause MySQL replication errors and prevent the possibility of rolling back to the old version.
Managing schema changes in a MySQL replication environment necessitates careful planning. Managing schema changes in a MySQL replication environment necessitates careful planning. Some schema changes, such as ALTER TABLE, CREATE TABLE, or DROP TABLE statements, may not function with this rollback solution.
Prepare the green environment for rollback
Before the blue/green deployment switchover, configure the green environment for rollback:
Configuring Aurora MySQL binary logging
Before the Aurora MySQL blue/green deployment switchover, ensure that binary logging is active and capturing binary logs in the green environment. These binary logs are crucial for continuous replication. By default, binary logs are disabled on Aurora MySQL and don’t need to be enabled unless data is being replicated out of the Aurora cluster. To enable binary logs, set the binlog_format parameter to ROW or MIXED in the custom DB cluster parameter group attached to the source DB cluster. Since binlog_format is a static parameter, the writer DB instance of your cluster must be rebooted for the change to take effect. Therefore, it’s recommended to enable the binlog_format parameter in the green environment before the blue/green deployment switchover to avoid any reboot or outage after the switchover. For more information about MySQL binary logging, see Configuring Aurora MySQL binary logging.
To confirm that binary logging is active on the green Aurora DB cluster, connect to your instance and run the following command:
In this post, we set the binlog_format parameter to ROW:
Configuring Aurora MySQL binary log retention
When using logical replication as a rollback approach, it’s essential to ensure that binary logs on the green environment are retained for a sufficient duration. To set the binary log retention time, use the mysql.rds_set_configuration procedure and specify the configuration parameter binlog retention hours, along with the desired number of hours to retain binary logs on the DB cluster. The maximum value for Aurora MySQL 3.x and Aurora MySQL version 2.11 and later is 90 days, while the default value of binlog retention hours is NULL, signifying that binary logs are not retained. The binary log retention period should be sufficient to handle the maximum replication delay between the production and rollback environment. This is the time to complete the blue/green switchover and configure replication in the rollback environment. For more information about Aurora MySQL binary log retention, see configuring binary log retention.
In this post, we set the binlog retention hours to 24 on the green environment, which should provide ample time to perform a blue/green switchover and configure replication back to the rollback environment.
Validate the binary log retention after the change:
Perform the Aurora MySQL blue/green deployment switchover
A switchover involves promoting the DB cluster and its associated DB instances from the green environment to become the new production DB cluster. Prior to the switchover, production traffic is directed to the cluster in the blue (current production) environment. After the successful switchover, production traffic is then routed to the newly promoted DB cluster, which was previously the green (staging) environment.
Perform the Aurora MySQL blue/green deployment switchover. For information about the steps, see Switching a blue/green deployment and for the best practices for switchover, see Switchover best practices.
When the switchover is complete, you will see “Old Blue” and “New Blue” next to the database in the AWS Management Console for Aurora. The new blue environment is your newly promoted production environment.
After the switchover, the old blue DB cluster only allows read operations until you reboot.
After the switchover, the Aurora MySQL DB cluster and DB instances in the old production environment are retained. Standard costs apply to these resources.
After the switchover, capture the binary log file name and position from the new blue environment. For more details, see get binary log coordinates .The following AWS Command Line Interface (AWS CLI) command helps find the current binary log file name and position:
Delete the blue/green deployment.
Note: When you delete a blue/green deployment before switching it over, Amazon RDS optionally deletes the DB cluster in the green environment. Deleting a blue/green deployment doesn’t affect the blue environment. For more information, see deleting a blue/green deployment.
After you delete the blue/green deployment, new and old production DB clusters are in available status, as shown in the following image.
Set the old production DB cluster to read-only mode to avoid any write operations that could cause problems with replication from the new production environment to the old environment. To prevent write operations on the DB cluster, enable read_only database mode by changing the read_only value in DB cluster parameter group from 0 to 1.
Because it is a dynamic parameter, the modifications take effect without the need for a reboot.
Connect to the old production DB cluster and validate the DB instance mode after the change is applied:
Note: After deletion of the Aurora blue/green deployment, the terms “blue” or “green” are no longer relevant. For this post, we refer to the “new production” or “new blue” environment as the production environment, and the “old production” or “old blue” environment as the rollback environment.
Configure a replication user on the production DB cluster
In the production database, create a replication database user. In this post, we create a user named repl_user:
The user requires the REPLICATION CLIENT and REPLICATION SLAVE Grant these privileges to the user:
Configure binary log replication from production to rollback DB cluster
Connect to the rollback environment DB cluster endpoint and run a SQL command to configure manual MySQL replication by using rds_set_external_master. Use the MySQL binlog file name and position that you collected earlier:
Start the MySQL replication process from the rollback environment by using rds_start_replication:
Starting with Aurora MySQL version 3, data control language (DCL) statements such as CREATE USER, GRANT, and REVOKE are no longer replicated with the binary log replication. If you plan to run any DCL statements while replication is ongoing, you will need to run DCL statements on both the source and target databases.
Validate the MySQL replication status by using show slave status on the rollback environment, and make sure the replication process is up and running without any errors:
Rollback steps
In the event of a rollback scenario from the production to the rollback environment, follow these steps:
Monitor the replication lag to make sure the rollback environment can keep up with the production environment without significant lag. Use the following command on the rollback DB cluster to check the status.
After you find that there is no significant replication lag, stop all application connections on the production DB.
To prevent write operations on the production DB cluster, you can enable read_only database mode by changing the read_only value in the DB cluster parameter group from 0 to 1.
After enabling the read_only database mode, verify that the binary log position remains unchanged by running SHOW MASTER STATUSG on the production environment. This confirms that no additional data changes are occurring in the database.
Validate that all binary log files and events have been replicated from the production to the rollback DB cluster. Use the following command on the rollback DB cluster to check the status:
Validate Seconds_Behind_Master from the above output; it should be 0. For more information, see seconds_Behind_Master. Furthermore, use show master status command on the production DB cluster to ensure that the production and rollback DB clusters are in sync.
Compare the Relay_Master_Log_File and Exec_Master_Log_Pos from the rollback environment with the File and Position from the production environment. These values should be consistent before you stop the replication process.
Use the following command on the rollback DB cluster to stop the replication process:
To remove the MySQL replication configuration information, use rds_reset_external_master on the rollback DB cluster:
Turn off the read only database mode for the rollback DB cluster by changing the read_only parameter in the custom database parameter group setting from 1 to 0.
Update your application configuration to use the rollback DB cluster endpoints and start the application. Optionally, terminate all connections on the production cluster to ensure they reconnect to the rollback cluster. The rollback DB cluster environment now becomes your production environment.
Clean up
To avoid incurring future charges, consider deleting Aurora MySQL DB cluster resources that are no longer in use or will not be used in the future.
Summary
In this post, we provided a step-by-step procedure for implementing a rollback strategy following a switchover in an Aurora MySQL blue/green deployment. Given the intricacies of MySQL replications, thorough testing in non-production environments is strongly advised before deploying in a production environment. The solution provides a rollback plan in case any issues arise in a production environment after blue/green deployment switchover.
We invite you to leave your feedback in the comments section.
About the authors
Daxeshkumar Patel is a Database Consultant on the Professional Services team at Amazon Web Services. He works with customers and partners in their journey to the AWS Cloud with a focus on database migration and modernization programs.
Kamal Singh is a Senior Database Consultant at Amazon Web Services. He works with customers and partners in their journey to the AWS Cloud with a focus on database migration and modernization programs.
Bhavesh Rathod is a Principal Database Consultant with the Professional Services team at Amazon Web Services. He works as a database migration specialist to help Amazon customers migrate their on-premises database environment to AWS Cloud database solutions.
Read MoreAWS Database Blog