Monday, December 2, 2024
No menu items!
HomeDatabase ManagementBenchmark Amazon RDS for PostgreSQL with Dedicated Log Volumes

Benchmark Amazon RDS for PostgreSQL with Dedicated Log Volumes

In today’s data-driven landscape, organizations must ensure their mission-critical databases can handle demanding workloads without sacrificing performance. Amazon Relational Database Service (Amazon RDS) is a managed service offering a cost-efficient, reliable, and highly scalable managed service that satisfies such requirements.

In PostgreSQL, as with many relational databases, transactions are first logged in a Write-Ahead Log (WAL) buffer before being recorded in a separate WAL segment. This process is crucial for data recovery and replication. The performance of PostgreSQL transactions is affected by the storage system where these WAL segments reside. By default, all files, including WAL, are stored in the same location. However, PostgreSQL documentation recommends placing the WAL on separate storage to enhance performance. As detailed in the post Enhance database performance with Amazon RDS dedicated log volumes, Amazon introduced the Dedicated Log Volume (DLV) feature, enabling customers to store WAL segments on a separate volume with a fixed size of 1,024 GiB and 3,000 Provisioned IOPS.

One key advantage of DLV is that it enables write coalescing, which reduces overall IOPS demand for the WAL by grouping log writes into larger, more efficient IO operations. In typical storage setups, log and data writes are mixed in smaller, frequent operations, whereas DLV keeps them separate, allowing logs to be written in larger chunks. This separation increases throughput and IO efficiency, potentially lowering the need for high-performance storage and reducing costs without sacrificing performance.

In this post, we guide you through the process of benchmarking the performance of Amazon RDS for PostgreSQL using the Dedicated Log Volume (DLV) feature. To do this, we use pgbench – a tool for running benchmark tests on PostgreSQL databases, pgbench repeatedly executes a defined sequence of SQL commands across multiple concurrent database sessions. By analyzing the average transaction rate, measured in transactions per second, we can gain valuable insights into the performance characteristics of your RDS for PostgreSQL deployment. This allows you to make a direct comparison between the performance of Amazon RDS for PostgreSQL with and without the Dedicated Log Volume (DLV) feature enabled.

Through our benchmarking, you’ll learn how to quantify the performance improvements delivered by DLV. Let us get started and explore the potential of Amazon RDS for PostgreSQL with Dedicated Log Volume!

Solution overview

The solution involves running a benchmarking script on an Amazon Elastic Compute Cloud (Amazon EC2). The script tests the performance against two separate RDS for PostgreSQL instances where one instance has DLV enabled and the other does not have DLV enabled.

The following diagram illustrates the high-level architecture

Architecture

Prerequisites

To get started, you must complete the following prerequisites:

  1. Launch Amazon EC2 Linux instance . It is recommended to launch the instance in the same Virtual Private Cloud (Amazon VPC) as your RDS instance to minimize network latency.
  2. Install pgbench on the EC2 instance. Since the PostgreSQL server installation package includes the pgbench utility by default, you can install PostgreSQL from the amazon extras library.
    sudo amazon-linux-extras enable postgresql15
    sudo yum install postgresql-server -y
  1. You need a psql client with connectivity to your Amazon RDS for PostgreSQL instance. The previous step of PostgreSQL server installation also includes the psql client by default.
  1. Launch two Amazon RDS for PostgreSQL instances one with DLV and one without.

To create instances, use the following AWS Command Line Interface (AWS CLI) command, providing your actual AWS Region and AWS account number. The parameter --dedicated-log-volume enables the DLV feature, and --no-dedicated-log-volume disables it. Refer to Enabling DLV when you create a DB instance for more information.

The following code creates an RDS for PostgreSQL instance with DLV:

aws rds create-db-instance 
    --db-instance-identifier dlv 
    --db-instance-class db.r5.16xlarge 
    --engine postgres 
    --engine-version 15.8 
    --master-username postgres 
    --master-user-password $(aws secretsmanager get-secret-value --secret-id rds/db_master_password --query SecretString --output text | jq -r '.master_user_password') 
    --allocated-storage 1024 
    --enable-performance-insights 
    --performance-insights-retention-period 31 
    --monitoring-interval 1 
    --monitoring-role-arn arn:aws:iam::<your_aws_account_number>:role/rds-monitoring-role 
    --region <Your_RDS_region>   
    --iops 10000      
    --storage-type io2 
    --dedicated-log-volume

The --no-dedicated-log-volume flag is not required during instance creation, as instances are created as non-DLV by default. The following code creates an RDS for PostgreSQL instance without DLV:

aws rds create-db-instance 
    --db-instance-identifier dlv 
    --db-instance-class db.r5.16xlarge 
    --engine postgres 
    --engine-version 15.8 
    --master-username postgres 
    --master-user-password $(aws secretsmanager get-secret-value --secret-id rds/db_master_password --query SecretString --output text | jq -r '.master_user_password') 
    --allocated-storage 1024 
    --enable-performance-insights 
    --performance-insights-retention-period 31 
    --monitoring-interval 1 
    --monitoring-role-arn arn:aws:iam::<your_aws_account_number>:role/rds-monitoring-role 
    --region <Your_RDS_region>   
    --iops 10000      
    --storage-type io2

We used AWS Secrets Manager to store the master user account password. Refer to Improve security of Amazon RDS master database credentials using AWS Secrets Manager for more information on using Secrets Manager to improve the security of your RDS instance.

Performance benchmarking

This section discusses the benchmark we ran to compare performance between an RDS for PostgreSQL instance with DLV enabled and an instance with no DLV.

Infrastructure

The following table summarizes the infrastructure of the EC2 instance on which the benchmarking script is run.

Instance class Operating System vCPU Memory (GiB)
r5.4xlarge Amazon Linux 2023 16 128

The following table summarizes the infrastructure of each RDS instance.

Instance name Instance class vCPU Memory (GiB) Storage Dedicated Log Volume
dlv db.r5.16xlarge 64 512 IO2 (10,000 PIOPS) On
withoutdlv db.r5.16xlarge 64 512 IO2 (10,000 PIOPS) Off

Database configuration parameters

Like other relational database engines, the performance of Amazon RDS for PostgreSQL is directly impacted by database configuration parameters. Parameters related to commit, checkpoint, and WAL configuration are vital for evaluating performance. To obtain statistics for WAL, it’s necessary to enable the track_wal_io_timing parameter. Throughout the test, the default parameters of Amazon RDS for PostgreSQL 15.5 are used; the following are the list of key parameters and their settings:

SELECT name, setting
FROM pg_settings
WHERE name IN ('synchronous_commit', 'commit_delay', 'checkpoint_timeout', 'wal_writer_delay', 'wal_buffers', 'shared_buffers', 'wal_compression', 'wal_writer_flush_after', 'wal_segment_size','track_wal_io_timing');
Output:
          name          | setting
------------------------+----------
 checkpoint_timeout     | 300
 commit_delay           | 0
 shared_buffers         | 16324847
 synchronous_commit     | on
 track_wal_io_timing    | on
 wal_buffers            | 8192
 wal_compression        | zstd
 wal_segment_size       | 67108864
 wal_writer_delay       | 200
 wal_writer_flush_after | 128

Benchmarking script

To achieve the most accurate performance comparisons, we benchmarked DLV and non-DLV instances independently using the following script. Prior to benchmarks, we reset the database’s WAL statistics with the pg_stat_reset_shared function to make sure results reflect the specific test conditions rather than previous activity. Our performance evaluation used the pgbench tool to create a TPC-B-like workload, simulating 64 concurrent clients processing half a million transactions each against data scale (-s) of 10,000. TPC-B is a widely recognized industry standard benchmark that focuses on measuring transaction throughput (transactions per second), providing a valuable metric for comparing the performance of different database systems.

The following is the script:

#!/bin/bash
# Function to print usage information
print_usage() {
        echo "Usage:"
        echo "To load data ==>  $0 load <SCALE> <AWS_ACCOUNT_ALIAS> <AWS_REGION> <DLV_INSTANCE> <NON_DLV_INSTANCE>"
        echo "To run benchmarks ==>  $0 run <NUM_TRANSACTIONS> <NUM_CLIENTS> <NUM_THREADS> <AWS_ACCOUNT_ALIAS> <AWS_REGION> <DLV_INSTANCE> <NON_DLV_INSTANCE>"
        echo "To load data and run benchmarks ==>  $0 load_run <SCALE> <NUM_TRANSACTIONS> <NUM_CLIENTS> <NUM_THREADS> <AWS_ACCOUNT_ALIAS> <AWS_REGION> <DLV_INSTANCE> <NON_DLV_INSTANCE>"
        exit 1
}
# Function to check if an RDS instance exists
check_instance_exists() {
        INSTANCE=$1
        AWS_REGION=$2
        aws rds describe-db-instances --db-instance-identifier $INSTANCE --region $AWS_REGION >/dev/null 2>&1
        if [ $? -ne 0 ]; then
                echo "Error: RDS instance '$INSTANCE' does not exist in region '$AWS_REGION'."
                exit 1
        fi
}
# Function to retrieve the PostgreSQL password from AWS Secrets Manager
get_pg_password() {
        SECRET_ID="rds/db_master_password"
        AWS_REGION=$1
        aws secretsmanager get-secret-value --secret-id $SECRET_ID --region $AWS_REGION --query SecretString --output text | jq -r '.master_user_password'
}
# Check the command and set the required number of arguments accordingly
if [ "$#" -lt 2 ]; then
        print_usage
fi
COMMAND=$1
shift
# Create logs directory
mkdir -p $HOME/DLV/logs
# Redirect stdout and stderr to a log file
exec >$HOME/DLV/logs/dlv-blog-$COMMAND-$(date +"%d-%m-%Y-%H%M").log 2>&1
# Log the start time and parameters
echo "Command: $COMMAND"
load_data() {
        SCALE=$1
        AWS_ACCOUNT_ALIAS=$2
        AWS_REGION=$3
        DLV_INSTANCE=$4
        NON_DLV_INSTANCE=$5
        PGPASSWORD=$(get_pg_password $AWS_REGION)
        export PGPASSWORD
        # Check if instances exist
        check_instance_exists $DLV_INSTANCE $AWS_REGION
        check_instance_exists $NON_DLV_INSTANCE $AWS_REGION
        echo "Load parameters: SCALE=$SCALE, AWS_ACCOUNT_ALIAS=$AWS_ACCOUNT_ALIAS, AWS_REGION=$AWS_REGION, DLV_INSTANCE=$DLV_INSTANCE, NON_DLV_INSTANCE=$NON_DLV_INSTANCE"
        # Load data into dlv instance
        nohup pgbench -i -q -s $SCALE -h $DLV_INSTANCE.$AWS_ACCOUNT_ALIAS.$AWS_REGION.rds.amazonaws.com -U postgres -d postgres >$HOME/DLV/logs/dlv-blog-load-dlv-t-$SCALE-$(date +"%d-%m-%Y-%H%M").log 2>&1 &
        # Load data into withoutdlv instance
        nohup pgbench -i -q -s $SCALE -h $NON_DLV_INSTANCE.$AWS_ACCOUNT_ALIAS.$AWS_REGION.rds.amazonaws.com -U postgres -d postgres >$HOME/DLV/logs/dlv-blog-load-withoutdlv-t-$SCALE-$(date +"%d-%m-%Y-%H%M").log 2>&1 &
        # Wait for data loading to complete
        wait
        echo "Data loaded"
}
run_benchmarks() {
        NUM_TRANSACTIONS=$1
        NUM_CLIENTS=$2
        NUM_THREADS=$3
        AWS_ACCOUNT_ALIAS=$4
        AWS_REGION=$5
        DLV_INSTANCE=$6
        NON_DLV_INSTANCE=$7
        PGPASSWORD=$(get_pg_password $AWS_REGION)
        export PGPASSWORD
        # Check if instances exist
        check_instance_exists $DLV_INSTANCE $AWS_REGION
        check_instance_exists $NON_DLV_INSTANCE $AWS_REGION
        echo "Run parameters: NUM_TRANSACTIONS=$NUM_TRANSACTIONS, NUM_CLIENTS=$NUM_CLIENTS, NUM_THREADS=$NUM_THREADS, AWS_ACCOUNT_ALIAS=$AWS_ACCOUNT_ALIAS, AWS_REGION=$AWS_REGION, DLV_INSTANCE=$DLV_INSTANCE, NON_DLV_INSTANCE=$NON_DLV_INSTANCE"
        # Clear WAL statistics
        echo "Clearing WAL statistics"
        echo "Clearing WAL statistics for dlv"
        psql -h $DLV_INSTANCE.$AWS_ACCOUNT_ALIAS.$AWS_REGION.rds.amazonaws.com -U postgres -c "SELECT pg_stat_reset_shared('wal');"
        echo "Clearing WAL statistics for withoutdlv"
        psql -h $NON_DLV_INSTANCE.$AWS_ACCOUNT_ALIAS.$AWS_REGION.rds.amazonaws.com -U postgres -c "SELECT pg_stat_reset_shared('wal');"
        # Start benchmarks
        echo "Starting benchmarks"
        nohup pgbench postgres -c $NUM_CLIENTS -j $NUM_THREADS -t $NUM_TRANSACTIONS -N -b tpcb-like -P 60 -h $DLV_INSTANCE.$AWS_ACCOUNT_ALIAS.$AWS_REGION.rds.amazonaws.com -U postgres >$HOME/DLV/logs/dlv-blog-run-dlv-t-$NUM_TRANSACTIONS-$NUM_CLIENTS-$NUM_THREADS-$(date +"%d-%m-%Y-%H%M").log 2>&1 &
        nohup pgbench postgres -c $NUM_CLIENTS -j $NUM_THREADS -t $NUM_TRANSACTIONS -N -b tpcb-like -P 60 -h $NON_DLV_INSTANCE.$AWS_ACCOUNT_ALIAS.$AWS_REGION.rds.amazonaws.com -U postgres >$HOME/DLV/logs/dlv-blog-run-withoutdlv-t-$NUM_TRANSACTIONS-$NUM_CLIENTS-$NUM_THREADS-$(date +"%d-%m-%Y-%H%M").log 2>&1 &
        # Wait for benchmarks to complete
        wait
        # Retrieve WAL statistics
        echo "Getting WAL statistics"
        echo "WAL statistics for dlv instance"
        psql -h $DLV_INSTANCE.$AWS_ACCOUNT_ALIAS.$AWS_REGION.rds.amazonaws.com -U postgres <<EOF
x
SELECT * FROM pg_stat_wal;
x
EOF
        echo "WAL statistics for withoutdlv instance"
        psql -h $NON_DLV_INSTANCE.$AWS_ACCOUNT_ALIAS.$AWS_REGION.rds.amazonaws.com -U postgres <<EOF
x
SELECT * FROM pg_stat_wal;
x
EOF
        echo "All done"
}
if [ "$COMMAND" == "load" ]; then
        # Check if the required arguments for load are provided
        if [ "$#" -ne 5 ]; then
                print_usage
        fi
        load_data $@
elif [ "$COMMAND" == "run" ]; then
        # Check if the required arguments for run are provided
        if [ "$#" -ne 7 ]; then
                print_usage
        fi
        run_benchmarks $@
elif [ "$COMMAND" == "load_run" ]; then
        # Check if the required arguments for load_run are provided
        if [ "$#" -ne 8 ]; then
                print_usage
        fi
        SCALE=$1
        NUM_TRANSACTIONS=$2
        NUM_CLIENTS=$3
        NUM_THREADS=$4
        AWS_ACCOUNT_ALIAS=$5
        AWS_REGION=$6
        DLV_INSTANCE=$7
        NON_DLV_INSTANCE=$8
        load_data $SCALE $AWS_ACCOUNT_ALIAS $AWS_REGION $DLV_INSTANCE $NON_DLV_INSTANCE
        # Pause to allow for more accurate visual inspection of Performance Insights for benchmarking tests
        echo "sleeping"
        echo "sleep starts: $(date +"%d-%m-%Y-%H%M%S")"
        sleep 600
        echo "sleep ends: $(date +"%d-%m-%Y-%H%M%S")"
        run_benchmarks $NUM_TRANSACTIONS $NUM_CLIENTS $NUM_THREADS $AWS_ACCOUNT_ALIAS $AWS_REGION $DLV_INSTANCE $NON_DLV_INSTANCE
else
        echo "Invalid command: $COMMAND. Use 'load' to load data, 'run' to run benchmarks, or 'load_run' to do both."
        print_usage
fi

Save the script to a file named dlv_bm.sh and make it executable. To run this script, use the following command. You can adjust the benchmarking parameters as needed:

nohup ./dlv_bm.sh load_run <SCALE> <NUM_TRANSACTIONS> <NUM_CLIENTS> <NUM_THREADS> <AWS_ACCOUNT_ALIAS> <AWS_REGION>  <DLV_INSTANCE> <NON_DLV_INSTANCE> &
  • SCALE = scale factor i.e Multiply the number of rows generated by the scale factor
  • NUM_TRANSACTIONS = Number of transactions within the interval
  • NUM_CLIENTS = Number of concurrent database sessions
  • NUM_THREADS = Number of worker threads
  • AWS_ACCOUNT_ALIAS = AWS account alias
  • AWS_REGION = The region where your resources are located.
  • DLV_INSTANCE = The name of your RDS instance with DLV
  • NON_DLV_INSTANCE = The name of your RDS instance without DLV

For example:

nohup ./dlv_bm.sh load_run 10000 500000 64 32 ab24x43b223s us-west-1 dlv withoutdlv &amp;

Upon completion of the script, the following three log files are generated for further analysis:

The two key log files for the comparison are the ones with the dlv-blog-run-dlv-*.log and dlv-blog-run-withoutdlv-*.log patterns. These represent the runs with Dedicated Log Volume enabled and disabled, respectively.

By running a diff command on these two log files, we can directly compare the transaction performance metrics reported by pgbench.

For example:

diff dlv-blog-run-withoutdlv-t-500000-64-32-28-10-2024-0249.log dlv-blog-run-dlv-t-500000-64-32-28-10-2024-0249.log -y|grep "tps = "

Following is the output from the benchmark logs we did:

Monitoring DLV

This section explores various techniques to monitor an RDS for PostgreSQL instance with DLV enabled.

Monitor DLV using Amazon CloudWatch

Keeping an eye on DLV that stores transaction logs (WALs) is just as important as monitoring other database volumes. Because the speed of writing WALs directly affects transaction performance, monitoring IOPS, latency, and throughput of the DLV is crucial for understanding overall database performance. You can use Amazon CloudWatch to monitor these metrics. The post Enhance database performance with Amazon RDS dedicated log volumes shows relevant CloudWatch metrics for DLV usage monitoring.

Monitor DLV using PostgreSQL’s pg_stat_wal

In addition to CloudWatch metrics, the PostgreSQL view pg_stat_wal, available in version 14 and later, can help you monitor WAL performance in your RDS for PostgreSQL instance. This view is part of PostgreSQL’s Cumulative Statistics System and provides detailed statistics about WAL activity.

Evaluating performance

To assess the impact of DLV on database performance, we reviewed following metrics:

  • Runtime
  • Transactions per second (TPS) and latency average recorded by pgbench
  • PostgreSQL’s cumulative statistic, wal_sync_time and wal_write_time, captured from pg_stat_wal
  • Graphs from Amazon RDS Performance Insights

Runtime

The runtime represents the duration required for processing all 32 million transactions using pgbench, with each client handling half a million transactions. The runtime underwent a significant reduction, nearly halving from 77 minutes for instances without DLV to just 40 minutes for instances with DLV enabled. This indicated a 92.5% improvement in processing time, emphasizing the substantial improvement powered by the instance with DLV.

Metric Without DLV With DLV % Improvement
Time taken by the benchmark (minutes) 77 40 92.5%

TPS and latency

The following table shows the performance improvement by the instance with DLV enabled. TPS experienced an increase of 91.83%, nearly doubling the system’s capacity to process transactions. Simultaneously, latency, a measure of response time, was cut in half, with an improvement of by 95.56%. This reduction in latency translates to a more responsive and efficient system. Overall, the implementation of DLV has led to a considerable enhancement in system performance, making it faster and more capable of handling a higher volume of tasks.

Metric Without DLV With DLV % Improvement
TPS 6984 13397 91.83%
Latency average (milliseconds) 8.8 4.5 95.56%

WAL statistics

We closely examined key WAL statistics from pg_stat_wal like wal_write_time (total time spent writing WAL buffers to disk) and wal_sync_time (total time spent syncing WAL files). DLV demonstrated improvement in both metrics: 102.6% for sync time and 9.2% for write time in our benchmark. This translates to faster transaction commits and a noticeable boost in overall database responsiveness, making DLV a good fit for optimizing PostgreSQL in high-demand scenarios.

Metric Without DLV With DLV % Improvement
wal_write_time (milliseconds) 56656.796 51882.614 9.2
wal_sync_time (milliseconds) 4431621.33 2186910.078 102.6

RDS Performance Insights

The following are graphs captured from RDS Performance Insights during the time period of benchmarking.

The first graph shows database load and top waits with DLV.

The following graph show database load and top waits without DLV.

The comparison in Performance Insights data highlighted a significant difference in resource consumption between DLV and non-DLV instances. The DLV-enabled instance consistently exhibited a lower average active session count, indicating a decrease in the overall demand for resources on the database. This improvement was attributed to DLV’s ability to offload the demanding task of WAL segment writing to a separate volume, freeing up the main database storage for other operations.

The standard RDS instance (without DLV) showed a higher prevalence of wait events, particularly those related to lightweight locks (LWLock). This often signals I/O contention, where the database is struggling to keep up with data write demands. In contrast, the DLV-enabled instance experienced a drastic reduction in these wait events, leading to more efficient data retrieval.

The wait details are as follows:

  • LWLock:WALWrite – This wait event was substantially lower in the DLV-enabled instance (12.97% of AAS) compared to the non-DLV instance (35.69% of AAS), highlighting DLV’s effectiveness in mitigating WAL-related bottlenecks.
  • IO:WALSync – This wait event is higher in the non-DLV instance with 0.83% of AAS compared to the DLV instance with 0.65% of AAS. Although it’s less notable, this wait event was still slightly higher in the non-DLV instance, further supporting DLV’s role in improving WAL write performance.

Benchmarking results confirmed DLV’s effectiveness in mitigating WAL-related wait events, aligning with its intended function. This validation demonstrates DLV’s success in offloading WAL writes, leading to enhanced database performance.

Clean up

If you are testing this solution, complete the following steps to remove resources and avoid charges

  1. On the Amazon EC2 console, select the Linux EC2 instance, and on the Instance state menu, choose Terminate instance.
  2. On the Amazon RDS console, select the RDS for PostgreSQL instances, and on the Actions menu, choose Delete.

Conclusion

In this post, we demonstrated how DLV can empower your RDS database with enhanced write performance, reducing WAL contention, a common bottleneck in write-intensive workloads. By offloading WAL writes to a dedicated volume, DLV frees up your main database resources, enabling smoother data access and overall improved responsiveness. This translates to faster write operations, reduced latency, and a more efficient database environment, which is especially crucial for applications demanding consistent high performance.

Furthermore, DLV alleviates storage bottlenecks by isolating WAL activity, making sure critical write operations aren’t hindered by other database processes. This results in a more stable and predictable performance, even under heavy load. If your RDS workload involves frequent writes or demands low-latency transactions, enabling DLV is a strategic move to unlock your database’s full potential and achieve optimal performance under pressure.

DLV is compatible with PIOPS storage types (io1 and io2 Block Express) and are provisioned with a fixed size of 1,024 GiB and 3,000 Provisioned IOPS. They are supported in Amazon RDS for PostgreSQL versions 13.10 and higher, 14.7 and higher, and 15.2 and higher, across all Regions.

Do you have follow-up questions or feedback? Leave a comment. We’d love to hear your thoughts and suggestions.


About the Authors

Naga Appani is a Database Engineer at Amazon Web Services and Subject Matter Expert for RDS PostgreSQL, he has over a decade of experience in working with relational databases. At AWS, He is mainly focused on PostgreSQL deployments, he works with developers in bringing new features and assist customers in resolving critical issues with Aurora and RDS PostgreSQL Database systems.

Belma Canik is a Senior Database Specialist Technical Account Manager (STAM) at Amazon Web Services. She helps customers run optimized workloads on AWS and make the best out of their cloud journey especially in Amazon RDS and Aurora Database Engines.

Sid Vantair is a Solutions Architect with AWS covering Strategic accounts. With over a decade of experience in working with relational databases, he thrives on resolving complex technical issues to overcome customer hurdles. Outside of work, he cherishes spending time with his family and fostering inquisitiveness in his children.

Read MoreAWS Database Blog

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Most Popular

Recent Comments