Collecting database performance metrics is critical for troubleshooting any performance issues. AWS offers several services to gather and analyze database performance metrics, including Amazon Relational Database Service (Amazon RDS) Enhanced Monitoring, Amazon CloudWatch, and Amazon RDS Performance Insights. While these services provide detailed insights on how your database is performing, sometimes you need a different analysis method to troubleshoot a specific database performance problem. Additionally, many database administrators like to use command line tools over a graphical user interface (GUI) to analyze database performance quickly and repeatedly.
Periodic metrics analysis is an analysis technique where you review the same set of metrics collected at consistent intervals over a period of time. This technique is helpful for comparing the behaviors of a database over an interval of time to see if there are opportunities to improve or maintain performance as the workload changes.
PGSnapper is an open-source tool developed by AWS Database Partner Solutions Architects that can assist with periodic collection (snapping) of Amazon RDS for PostgreSQL or Amazon Aurora PostgreSQL-Compatible Edition performance-related statistics and metrics. In this post, we review how PGSnapper works and its use cases. We then walk through how to set up PGSnapper using AWS CloudFormation, load PGSnapper-generated output to a PostgreSQL database, and analyze the data using provided sample SQL queries to derive insights.
The following diagram shows the architecture that you set up in this walkthrough.
PGSnapper is written in Python. It uses a JSON config file that you can customize to include database dictionary views and queries, whose output you want to capture periodically.
PGSnapper runs on an Amazon Elastic Compute Cloud (Amazon EC2) instance with access to the PostgreSQL instance to be monitored. It retrieves database credentials stored in AWS Secrets Manager. To minimize impact on the database, each PGSnapper run exports the output of the queries being monitored to a separate CSV-formatted file on an EC2 instance. When you’re ready to analyze the output, you need to run a one-time packaging to collect other database stats and generate Data Definition Language (DDL) commands for creating the staging tables used for import. You can load the output generated by PGSnapper to any PostgreSQL database (for example, dev) using the Loader script included in the tool and perform analysis using the provided sample queries. You can also write your own custom SQL queries to analyze the output however you want.
The PGSnapper setup also creates an Amazon Simple Storage Service (Amazon S3) staging bucket that you can use for storing and sharing the PGSnapper-generated output with other teams.
PGSnapper use cases
After PGSnapper is set up and scheduled to run periodically, you can use it for several different use cases, such as the following:
Analyze database performance following a load test
Dig deeper into database metrics to find the top resource-intensive queries if your database has a performance issue during a particular hour of the day
Collect and share database performance metrics for your production database with another team, such as your performance engineering team, without giving them direct access to production
We look at some examples of these use cases later in this post.
Complete the following prerequisite steps before setting up PGSnapper. Note that some of the DB parameters updated in this section are static and require an instance reboot to take effect.
Make sure you have a PostgreSQL instance to use for this walkthrough. If not, you can create an RDS for PostgreSQL instance or Aurora PostgreSQL-Compatible cluster.
If your PostgreSQL instance is hosted in a private subnet, ensure that an internet gateway is attached to the VPC and a NAT gateway has been created in the public subnet. This will allow the PGSnapper EC2 instance, deployed in the private subnet, to download the necessary packages from internet during the bootstrapping process. You can follow AWS documentation to deploy a NAT gateway.
When you create a new RDS for PostgreSQL DB instance or Aurora PostgreSQL DB cluster, it comes with default parameter groups, which can’t be updated. For RDS for PostgreSQL, create a custom DB parameter group and associate it with the RDS instance. For Aurora PostgreSQL, create a custom cluster parameter group along with a custom DB parameter group. Associate the cluster parameter group with the Aurora cluster and the DB parameter group with the primary DB instance and the Aurora replicas.
Modify the shared_preload_libraries DB parameter and add pg_stat_statements extension if not already there. You can set this up in the DB parameter group for RDS for PostgreSQL and the cluster parameter group for Aurora PostgreSQL.
Modify the track_functions parameter and set to all to track procedural language, SQL, and C language functions. You can set this up in the DB parameter group for RDS for PostgreSQL and the cluster parameter group for Aurora PostgreSQL.
Set the track_activity_query_size parameter to 102400 to capture the full text of very long SQL statements. You can set this up in the DB parameter group for RDS for PostgreSQL and the cluster parameter group for Aurora PostgreSQL.
Verify and save the parameter updates.
The following image shows parameters that you are modifying as per prerequisites.
Reboot the database instance. The shared_preload_libraries and track_activity_query_size parameters are static and require an instance reboot for them to take effect.
Create a database user which PGSnapper can use to connect to the PostgreSQL instance for collecting database metrics. You can use psql or any PostgreSQL GUI client such as pgAdmin for running the command below after updating the placeholders in <>. For this post, we use psql to create a database user named pgsnapper as a member of pg_monitor role, after logging in as the master user. Enter the master user password when prompted.
Set up PGSnapper using AWS CloudFormation
The provided CloudFormation template completes the following setup steps in your AWS account:
Stores the database user password in a Secrets Manager secret, which PGSnapper uses to connect to the PostgreSQL instance
Creates an EC2 instance with the latest Amazon Linux 2 AMI and deploys it in the same VPC and subnet as the PostgreSQL database instance
Bootstraps the EC2 instance by installing the AWS Systems Manager agent, PostgreSQL client, required Python packages, and staging PGSnapper scripts
Creates an S3 bucket for storing and sharing PGSnapper-generated output
Adds the security group for the EC2 instance to the security group assigned to the PostgreSQL instance for inbound network access.
Choose Launch Stack to deploy the CloudFormation stack in your AWS account in the Region where the PostgreSQL instance to be monitored is running.
Enter the stack parameters as shown in the following table.
VPC ID of PostgreSQL database instance to be monitored (for example, vpc-0343606e).
VPC subnet ID of the PostgreSQL database instance to be monitored (for example, subnet-a0246dcd).
Security group ID of the PostgreSQL database instance to be monitored (for example, sg-8c14mg64).
PGSnapper EC2 instance type. Leave the default value.
PGSnapper EC2 instance EBS Volume Size in GiB. Leave the default value.
Database username for the PostgreSQL instance to be monitored. For this post, specify the pgsnapper user which you created in the prerequisites section.
Database user password for the PostgreSQL instance to be monitored. For this post, enter the pgsnapper user password which you specified in the prerequisites section.
Port for the PostgreSQL instance to be monitored.
Select the acknowledgement check box and choose Create stack. Stack creation takes about 7 minutes to complete.
When it’s complete, choose the Outputs tab of the stack and note the resources that you need to configure PGSnapper. Following image shows the output tab, you can find the key resources and corresponding values here.
On the Amazon EC2 console, select the PGSnapper instance by referring to the value of the PGSnapperEC2InstID CloudFormation output key and choose Connect. On the Session Manager tab, choose Connect again.
Because Session Manager uses a user named ssm-user to connect to an EC2 instance by default, change to the user ec2-user by running the following command:
Load the pg_stat_statements extension into the PostgreSQL database where application-related objects are stored and which needs to be monitored. For this post, you use psql, which is already installed on the PGSnapper EC2 instance, to log in as the master user and create a new database named testapp. Once the database is created, load the extension into it. Enter the master user password when prompted.
Review PGSnapper usage by running the following command:
Save the PostgreSQL instance-related settings in environment variables for easy retrieval:
Run PGSnapper manually one time using the following command and review the log file generated under the /home/ec2-user/scripts/log/$PGHOST/$PGDATABASE/ directory. By default, all the output is stored under the /home/ec2-user/scripts/output/$PGHOST/$PGDATABASE/ directory. If you’re specifying the output directory using the -o argument, the path needs to be specified as an absolute path, for example /home/ec2-user/mysnapperoutput.
Set up a crontab schedule in the EC2 instance to run PGSnapper every minute.
Press i to enter insert mode. Copy and paste the following after updating the placeholders in <>. Press Esc followed by :wq! to save & exit.
Verify crontab by entering the command below.
At this point, PGSnapper is set up and scheduled to run every minute. Next, you use pgbench, which is already installed on the PGSnapper EC2 instance, to generate some load on the testapp database that you created earlier.
Run the following command to initialize the pgbench load test. Enter the master user password when prompted
Run the following command to generate the pgbench load test. Enter the master user password when prompted. This step takes about 10 minute to complete.
When the load test is complete, edit crontab and comment out PGSnapper schedule runs by adding # in front of the job. Press Esc followed by :wq! to save & exit.
Package the output
Now, you are ready to package the PGSnapper-generated output files and import them into a PostgreSQL database for performing analysis. PGSnapper packaging exports other database dictionary views and queries output mentioned in its config file, which are only required to be snapped once. It also generates a DDL command file named all_ddls.sql, which the Loader script described in the next section, uses to create staging tables and import the output generated by PGSnapper.
Package the PGSnapper output by running the following command:
Verify that packaging was successful by viewing the PGSnapper logfile.
PostgreSQL performance stats Loader script
Now you are ready to load PGSnapper-generated output into any PostgreSQL database to perform your analysis. As a best practice for production workloads, use a separate PostgreSQL instance to load PGSnapper-generated output. For instructions on configuring another instance, refer to Setup in the GitHub repo.
For this post, you use the same PostgreSQL instance that you used earlier to generate the PGSnapper output.
Create a new database to load PGSnapper-generated output. For this post, you create a new database named testappsnaps by logging in to the PostgreSQL instance as the master user. Enter the master user password when prompted.
Import the PGSnapper-generated output by running the following command. For the -o argument, provide the absolute path for the directory under which all the PGSnapper-generated CSV output files, including all_ddls.sql file, are stored. By default, this path is /home/ec2-user/scripts/output/$PGHOST/$PGDATABASE/.
Verify that the data loading was successful by viewing the loader script logfile /home/ec2-user/scripts/log/pg_perf_stat_loader.log.
Download sample queries for PGSnapper-generated output analysis
Sample queries for analyzing PGSnapper-generated output are available in GitHub. Download the SQL files to a machine where psql is installed, and which has access to the PostgreSQL instance where you imported the PGSnapper-generated output. For this post, you can use the PGSnapper EC2 instance.
Download the sample SQL files by running the following commands:
Run SQL queries for PGSnapper-generated output analysis
PGSnapper includes sample queries for various PostgreSQL instance and database-related stats, session stats, SQL stats, table stats, and index stats. For this post, you run a subset of these queries to view database and schema sizes, database related statistics and top 20 SQLs by elapsed time.
Log in to the testappsnaps database where you loaded the PGSnapper-generated output. Enter the pgsnapper database user password when prompted.
Run snappermenu.sql to find all the sample queries. The following screenshot shows only a subset of all the queries.
List all the snapshots available with sample start and end times, that PGSnapper collected by running the following SQL file:
Set the begin and end snapshot ID for analysis before running any of the analysis queries:
Run the following SQL file to see database and schema sizes:
Run the following SQL file to see database-related stats:
Run the following SQL file to list top 20 queries by elapsed time during the snapshot IDs you set earlier. Depending on your database version, snappermenu.sql will show the appropriate version (top_20_sqls_by_elapsed_time_v2.sql or top_20_sqls_by_elapsed_time.sql) of the SQL file.
Run the following SQL file and enter the query ID of interest to see detailed stats of that query for each snapshot. Depending on your database version, snappermenu.sql will show the appropriate version (sql_stat_history_v2.sql or sql_stat_history.sql) of the SQL file.
To uninstall PGSnapper and delete related AWS resources, complete the following steps:
If you previously saved any data in the S3 bucket created by the PGSnapper CloudFormation stack (CloudFormation output key: PGSnapperS3Bucket), go to the Amazon S3 console and empty that S3 bucket.
On the AWS CloudFormation console, delete the PGSnapper CloudFormation stack.
In this post, we covered how you can set up and use PGSnapper to periodically collect PostgreSQL database metrics and run customized queries to identify performance bottlenecks. Because you can load the PGSnapper-generated output to any PostgreSQL database, it doesn’t impact your production database while performing analysis and has very limited impact during metrics collection.
We encourage you to explore the other sample SQL files for analysis included in PGSnapper, beyond what we covered in this post. If you’re doing a load test, DB benchmarking, adding a new application service, scaling up your TPS/connection, or looking to identify resource-intensive queries during a particular time window, consider using PGSnapper.
If you have comments or questions, leave a comment. To submit bug fixes or request enhancements, open a pull request or an issue in the PGSnapper GitHub repository
About the Authors
Arabinda Pani is a principal partner solutions architect specializing in databases at AWS, brings years of experience from the field in managing, migrating, and innovating databases and data platforms at scale. In his role, Arabinda works with AWS Partners to provide guidance and technical assistance on database projects, helping them improve the value of their solutions when using AWS.
Abhinav Sarin is a senior partner solutions architect at AWS. His core interests include databases, data analytics, and machine learning. He works with AWS customers and partners to provide guidance and technical assistance on database projects, helping them improve the value of their solutions when using AWS.
Read MoreAWS Database Blog