Amazon Relational Database Service (Amazon RDS) makes it easy to set up, operate, and scale MySQL and MariaDB deployments in the cloud. Developers are often required to run complex queries in MySQL and MariaDB environments to create a dashboard or run simple ad hoc reports. Furthermore, application developers use complex stored procedures with user-created temporary tables to store intermediate results. To process these complex queries, MySQL or MariaDB creates internal temporary tables and files.
In this post, we discuss how MySQL and MariaDB engines process these complex queries and how the Optimized Reads feature for Amazon RDS for MySQL and Amazon RDS for MariaDB can improve the performance of these complex queries. We also discuss the use cases and best practices that are more suitable for Optimized Reads.
Internal temporary tables and files
The MySQL query optimizer creates internal temporary tables and files to process complex queries. For example, internal temporary tables get created when processing a query using DISTINCT combined with ORDER BY or when working with common table expressions (CTEs). Internal temporary tables are often created in memory first for best query performance; for example, when a BLOB or TEXT column in the table prevents using in-memory temporary table. If the table size exceeds the defined memory configurations, then it will overflow to disk storage. In addition, queries performing grouping and sorting operations use files on disk storage to process the results. These files are removed soon after the completion of the associated queries.
The following diagram illustrates the simplistic view of internal temporary table creation in MySQL and MariaDB.
In MariaDB, internal temporary tables are created using the memory storage engine by default. When the table size reaches max_heap_table_size or tmp_table_size, tables are converted to disk-based internal temporary tables. These disk-based internal temporary tables use either Aria or MyISAM, depending on the aria_used_for_temp_tables system variable. This variable, enabled by default in Amazon RDS for MariaDB and internal temporary tables on disk, is configured to use the Aria storage engine.
In MySQL 8.0, you can control the storage engine for in-memory internal temporary tables using internal_tmp_mem_storage_engine. The engine uses the TempTable storage engine as the default storage engine for the in-memory internal temporary tables, and it provides efficient storage for VARCHAR and VARBINARY columns and other binary large object types. Depending on the workload, you can customize the TempTable storage engine behavior. For example, the temptable_max_ram variable defines the maximum amount of RAM that can be used by the TempTable storage before it allocates space for memory-mapped files or before using InnoDB on-disk internal temporary tables. The variable tmp_table_size defines the maximum size of any individual in-memory internal temporary table. For more information about TempTable storage engine behavior on Amazon RDS for MySQL, see Use the TempTable storage engine on Amazon RDS for MySQL and Amazon Aurora MySQL.
In both MySQL and MariaDB, you can create temporary tables (user-created) to store intermediate results before writing to a persistent table or to perform a test.
Improving performance with Optimized Reads
Optimized Reads are supported on Amazon RDS for MySQL versions 8.0.28 and higher and Amazon RDS for MariaDB versions 10.4.25, 10.5.16, 10.6.10, and higher. On RDS for MySQL and RDS for MariaDB instances that do not have Optimized Reads enabled, both persistent and non-persistent objects like temporary tables and files are stored in Amazon Elastic Block Store (Amazon EBS). Depending on the amount of storage provisioned, Amazon RDS automatically provisions a single or multiple EBS volumes to enhance the performance of read and write operations. On optimized read-enabled instances, the temporary objects (like tables and files) are stored on the local NVMe-based SSD block-level storage that’s physically connected to the host server (instance store). This storage is optimized for low latency, high random I/O performance, and high sequential read throughput. As of writing this post, the following instance class families support Optimized Reads: db.r5d, db.r6gd, db.m5d, db.m6gd, x2iedn and x2idn. For an up-to-date list of the instances supported for Amazon RDS Optimized Reads, see RDS for MySQL and RDS for MariaDB user guides. The instance type that you choose determines the size of the instance store volume. See Instance store volumes for details about instance store volume size allocated for each instance class.
The following temporary objects are stored in the instance store volumes for Optimized Reads: internal temporary files, internal on-disk temporary tables, user-created temp tables, memory map files created by the TempTable engine, and temporary binlog cache files. Offloading the temporary objects to the instance store volume benefits use cases such as queries that involve large temporary objects like temporary tables or sort files. Furthermore, it helps when application queries involve using temporary user-created tables, complex CTEs, derived tables, or sub-queries. For more information, see Use cases for RDS Optimized Reads.
Identifying workload patterns that can benefit from Optimized Reads
There are several ways to determine whether Optimized Reads will benefit your workload. The following are some of the metrics that you can use to monitor your workload:
Monitor internal temporary table usage on your instance by checking the following status variables in the output of the SHOW GLOBAL STATUS command. You can run this command in intervals of a few minutes during the busy period to calculate the number of temporary tables created during that interval.
Created_tmp_disk_tables – Indicates the number of internal temporary tables that are converted to the disk-based temporary tables command in MySQL and MariaDB.
Created_tmp_files – Indicates the number of temporary files being created on the instance for different query runs. These are the files created as part of file sort and other operations.
If Amazon RDS Performance Insights is enabled on your instance, you can monitor these counter metrics in real time or over a specific period of time. See Overview of the Performance Insights dashboard for more information.
You can monitor server status variables over time by enabling Global Status History (GoSH) in RDS for MySQL and RDS for MariaDB instances.
Enable slow query logs to capture all slow queries on the instance and check the run plan of the queries logged in the slow query log. If any application-critical queries use file sort or temporary tables, those queries can take advantage of Optimized Reads. Note that it’s not always possible to optimize every single query that uses temporary files or tables during query processing because it can have a negative impact on the write operations. In these cases, these queries can benefit from optimized reads. The amount of data being processed during the query has a direct impact on the performance improvement gained with the feature usage.
In MySQL, you can monitor the usage of temporary storage on disk by the TempTable engine by using the following query. Usually, the provided value indicates the overall amount of data written to disk due to the usage of memory mapped files. A huge value indicates a high usage of disk-based temporary objects, and Optimized Reads can be an ideal feature to evaluate and understand the additional value it can add to the application performance.
The following query provides metadata about session temporary tablespaces used for internal and user-created temporary tables:
For Optimized Reads on a Multi-AZ instance, data written to the primary instance store volume isn’t replicated to the standby instance. This provides performance improvement for queries that use disk-based temporary tables.
After you find the workload pattern that can benefit from Optimized Reads, it’s recommended to test the application workload on staging or with sample production queries to understand the performance differences before moving to production workloads.
Even after evaluating the use cases and workload patterns, it’s always important to benchmark the performance gains.
Let’s do a quick benchmarking to evaluate the performance between an Optimized Read and non-optimized read instance. For this benchmark, we deployed an RDS for MySQL 8.0.30 instance and populated data using the sysbench open-source benchmark tool. We prepared the test database with a single table (sbtest1) with 600 million rows using the following statement:
Next, we took a snapshot of the instance and launched two new Multi-AZ instances using the snapshot. We used an Optimized Read instance with m6gd.4xlarge and a non-optimized read instance with m6g.4xlarge. Both instances use 2000 GiB IO1 storage with 40000 IOPS. The following code shows the sample query and its run plan. As we can see in the Extra column, the optimizer used a temporary table and performed a file sort to run the query:
Before the query ran, the instance was warmed up to minimize the impact of lazy loading. We started the instance with the innodb_buffer_pool_load_at_startup parameter set to 0 in a custom parameter group to ensure the InnoDB buffer pool was empty. Finally, we used the mysqlslap utility to run the query in parallel with a concurrency of five. The following table lists the average runtime reported by the mysqlslap.
Runtime in Seconds
As we can see from these results, the SELECT query that used temporary tables and file sort took 27,070 seconds on the m6g.4xlarge instance. The same query completed within 11,768 seconds on the m6gd.4xlarge instance, which was using the read-optimized feature. Without optimized reads, the same query took twice as long to run.
We captured the metrics during this testing to understand the behavior. The first graph compares the WriteIOPS metric between the optimized and non-optimized read instance. The optimized read instance used more than 25k IOPS of instance store volume process the SELECT query in half the time while non-optimized instance uses EBS volume with just 10k IOPS.
The second graph compares the ReadIOPS metric between two instances. The Optimized Read instance used double the ReadIOPS from the instance store volume when compared with non-optimized read instance using the EBS volume to process the SELECT query.
This is just an example to understand the effectiveness of the instance store used by the Optimized Reads feature. The actual performance improvement also depends on how many disk-based temporary objects (and size of the objects) are created by your queries. In addition, while the queries were in progress, the minimal number of IOPS (EBS volumes) was utilized by the optimized read instance. This means the additional IOPS burden on Amazon EBS was lifted, and the same resources (IOPS) could be utilized by other queries being run on the instance to operate on persistent database objects.
Monitoring LocalStorage metrics on RDS for MySQL and RDS for MariaDB instances
Amazon RDS sends metric data to Amazon CloudWatch in 1-minute intervals. You can monitor the utilization of local storage volume of the instance using the following CloudWatch metrics:
FreeLocalStorage – The amount of available local storage space for the DB instance in MB
ReadIOPSLocalStorage – The average number of disk read I/O operations to local storage per second
WriteIOPSLocalStorage – The average number of disk write I/O operations to local storage per second
ReadLatencyLocalStorage – The average amount of time taken per disk I/O operation for local storage
WriteLatencyLocalStorage – The average amount of time taken per disk I/O operation on local storage
WriteThroughputLocalStorage – The average number of bytes written to disk per second for local storage
ReadThroughputLocalStorage – The average number of bytes read from disk per second for local storage
The FreeLocalStorage metric provides details about the available local storage space. We recommend monitoring this metric, or you can set up a CloudWatch alarm so that you get a notification before storage gets filled up. To set up an alarm, see Creating CloudWatch alarms to monitor Amazon RDS.
How to take advantage of optimized reads
To start using optimized reads, RDS for MySQL you can launch instance with version 8.0.28 or higher or RDS for MariaDB you can launch instance with version 10.4.25, 10.5.16. 10.6.10, and higher. Then select one of the supported instance classes. That will automatically launch an RDS for MySQL/MariaDB instance with the optimized reads feature. If you have an existing instance running one of the above versions, you can create an RDS read replica with an optimized read-supported instance class and perform a test with your workload. After that, you can move your production workload to the read replica or promote that to a primary instance.
Best practices for optimized reads
In this section, we discuss some of the best practices when using optimized reads:
The size of the local storage volume is dependent on the instance class. Choose an instance class that provides sufficient local storage and CPU/memory resources for the workload.
When the instance store volume limit is reached, the query that utilized the storage space will fail with a table full error. For example, importing data using the LOAD DATA LOCAL statement creates cache files on the instance store volume of the RDS instance. It can consume full storage space and lead to query failures. To avoid this, run the data loading operation in batches. Also, consider setting up an alarm for the FreeLocalStorage metric. If the local storage gets filled up frequently, consider increasing the instance class to get more local storage space.
When backups are enabled in Amazon RDS, it automatically enables the binary logs for point-in-time recovery. When binary logging is enabled, all writes are first cached in memory before writing changes to the binary log file. The memory allocation is at the session or connection level and configured using the parameter binlog_cache_size. If the data for the transaction is bigger than this variable, the excess data is stored in a temporary file on disk. On optimized read-enabled instances, these files are created on local storage and can fill the local storage space. If you have large transactions, consider splitting them into small transactions to reduce the use of temporary files.
In case of Amazon RDS for MariaDB, user-created temporary InnoDB tables use a common temporary (ibtmp1) tablespace. This temporary tablespace will increase, but won’t decrease in size when temporary tables are dropped. The engine deletes this temporary tablespace when the server shuts down gracefully, and the tablespace is recreated when it starts again. Monitor your instance store volume using FreeLocalStorage metrics when you are working on many InnoDB user-created temp tables.
Consider benchmarking your workload or queries on Optimized Read-enabled instances to understand the performance gains. For write-heavy workloads, evaluate the optimized writes feature accordingly.
For more best practices, see Best practices for RDS Optimized reads.
In this post, we showed you how Optimized Reads can improve the performance of read queries that use temporary tables to process the query. We also discussed common use cases and best practices. The Amazon RDS Optimized Reads feature is available on Amazon RDS for MySQL version 8.0.28 and higher and Amazon RDS for MariaDB version 10.4.25, 10.5.16, 10.6.10, and higher. We encourage you to try out this new feature for your MySQL and MariaDB workloads and see how it improves the performance of your applications.
For complete details about the feature, see Improving query performance for RDS for MySQL and RDS for MariaDB with Amazon RDS Optimized Reads.
About the Authors
Surendar Munimohan is a Sr. Database Solutions Architect with Amazon Web Services.
Chelluru Vidyadhar is a Database Engineer with the Amazon RDS team at Amazon Web Services.
Read MoreAWS Database Blog