Friday, May 3, 2024
No menu items!
HomeDatabase ManagementDetermining the optimal value for shared_buffers using the pg_buffercache extension in PostgreSQL

Determining the optimal value for shared_buffers using the pg_buffercache extension in PostgreSQL

In OLTP databases, the buffer cache is memory allocated for storing cached data and dirty pages. Cached data speeds up the retrieval of frequently accessed information, and reading from these cached pages minimizes disk I/O operations. Setting appropriate values for shared_buffers is important for optimal PostgreSQL performance and can lead to significant reductions in overall database operational costs. In this post, we discuss the importance of the shared_buffers configuration and guide you on determining its optimal value using the pg_buffercache extension. The optimization strategies outlined in this post are applicable to PostgreSQL, regardless of whether it’s self-managed or hosted on Amazon Aurora PostgreSQL-Compatible Edition or Amazon Relational Database Service (Amazon RDS) for PostgreSQL.

Overview of shared_buffers

The global memory area in PostgreSQL mainly consists of shared_buffers, wal_buffers, and CLOG buffers. The shared_buffers memory area is used to store cached data and dirty pages. A dirty page is a written or modified data page in shared_buffers as the result of a write operation against index or table data. These dirty pages are flushed to the disk by a background writer process when the number of clean shared_buffers appears to be insufficient. The objective of this process is to ensure that the data pages are moved to permanent storage and free space is available for use in shared_buffers. Furthermore, checkpoints are automatically issued in the background based on checkpoint settings writing all dirty pages to disk to create a restore point in the event of a crash. Amazon Aurora PostgreSQL uses log records for higher availability instead of checkpoints for crash recovery. The following diagram illustrates the memory components of PostgreSQL.

The default configuration of shared_buffers varies depending on the PostgreSQL deployment. In the community version of PostgreSQL, the default value for shared_buffers is set at 128 MB. However, this value might be lower if your kernel settings are not supportive, as determined during the initdb process. For Amazon RDS for PostgreSQL, the default is calculated using the formula DBInstanceClassMemory/32768. Both community PostgreSQL and Amazon RDS for PostgreSQL lean heavily on the operating system for caching, making an allocation of 30–35% of memory to shared_buffers ideal. Conversely, in Amazon Aurora PostgreSQL, the default value is derived from the formula SUM(DBInstanceClassMemory/12038, -50003). This difference stems from the fact that Amazon Aurora PostgreSQL does not depend on the operating system for data caching. As a result, shared_buffers allocation in Amazon Aurora PostgreSQL is notably higher compared to that in Amazon RDS for PostgreSQL.

How to set shared_buffers

In both Amazon Aurora PostgreSQL and Amazon RDS for PostgreSQL, you can establish a custom parameter group to alter any parameter configuration. The shared_buffers setting is adjusted through this custom parameter group. Notably, shared_buffers is a static parameter, meaning any modifications require a reboot of the database instance for the changes to take effect. This parameter is defined as a number of 8 kB blocks. To determine the value of shared_buffers for Amazon Aurora PostgreSQL and Amazon RDS for PostgreSQL within the RDS parameter group, employ the following formula:

shared_buffers = (Target percentage x Total RAM in GB x 1024 x 1024) /(100 x 8)

For example, if you’re running an RDS instance on the db.m6g.12xlarge instance class (192 GB of RAM), a value of 10,066,329 on the shared_buffers parameter will set 40% of total RAM:

(40 x 192 x 1024 x 1024) /(100 x 8)

To set the shared_buffers value in your parameter group, complete the following steps:

On the Amazon RDS console, choose Databases in the navigation pane.
Navigate to your database and in the Instance section, choose the link for your associated parameter group.

In the Modifiable parameters section, search for and modify the shared_buffers parameter in the custom parameter group using the preceding formula.

Applying the new value of shared_buffers requires an instance reboot.

shared_buffers and I/O utilization

In both Amazon Aurora PostgreSQL and Amazon RDS for PostgreSQL, for read-intensive workloads, the engine tries to fulfill the query request using pages already in the buffer cache. If the required pages are absent from the cache, the PostgreSQL database engine initiates reads against the storage layer to fetch them. When workloads are served entirely from the cache, there’s no read I/O consumption. However, when they aren’t, I/O charges apply for the pages sourced from storage. For example, if the engine scans 8 data pages to retrieve 10 tuples, it registers 8 read I/O operations.

Amazon CloudWatch has the metric BufferCacheHitRatio, which displays the percentage of requests catered by the buffer cache. A higher hit ratio implies reduced read I/O. Therefore, to curtail read I/O, shared_buffers should be expansive enough to house the majority of the working dataset. For a deeper understanding of I/O characteristics in Amazon RDS for PostgreSQL, refer to I/O size and volume throughput limits.

Benefits of analyzing shared_buffers

shared_buffers in PostgreSQL plays a critical role in database performance. By storing data and indexes in memory, shared_buffers can significantly reduce the number of disk I/O operations. This results in faster query performance and reduces overall database cost. shared_buffers uses the least recently used (LRU) algorithm for cached pages. The LRU caching algorithm removes the least recently used data pages when the cache is full, and a new page is referenced that is not in the cache.

When shared_buffers is too small, it could result in frequent reads from storage, and the pages in the buffer may have to be constantly flushed to and from the disk. This affects performance and lowers query concurrency. However, setting the value of shared buffers too high can result in increased memory pressure on the system, which can lead to swapping and decreased performance. Factors to consider when tuning the shared_buffers setting include the amount of memory available on the system, the size of the working dataset, and the workloads on the database. Analyzing the shared_buffers setting and making appropriate adjustments ensures that the PostgreSQL database is performing optimally and providing the best possible result for workloads.

Use AWS monitoring tools to determine shared_buffers modification

In Amazon Aurora PostgreSQL, the BufferCacheHitRatio CloudWatch metric can help you determine whether to modify shared_buffers or not. If BufferCacheHitRatio is below 90%, consider increasing shared_buffers to serve more data from cache. In Amazon RDS for PostgreSQL, the hit ratio can be identified by querying the pg_statio_user_tables and pg_statio_user_indexes system catalogs. The following query identifies the hit ratio on tables:

SELECT
SUM(heap_blks_read) as heap_read,
SUM(heap_blks_hit) as heap_hit,
SUM(heap_blks_hit) / (SUM(heap_blks_hit) + SUM(heap_blks_read)) as hit_ratio
FROM
pg_statio_user_tables;

The following query identifies hit ratio on indexes:

SELECT
SUM(idx_blks_read) as idx_read,
SUM(idx_blks_hit) as idx_hit,
(SUM(idx_blks_hit) – SUM(idx_blks_read)) / SUM(idx_blks_hit) as ratio
FROM
pg_statio_user_indexes;

To finalize your shared_buffers modification, in addition to CloudWatch metrics, you can use Amazon RDS Performance Insights. Performance Insights is a database performance tuning and monitoring feature that helps you quickly assess workloads on your database and determine next steps. Counter metrics such as blks_hits, blks_read, tup_fetched, and tup_returned can be used to better estimate shared_buffers. Performance Insights offers 7 days of performance data history at no charge. For pricing details, refer to Performance Insights Pricing. The following screenshot shows these counters for an Aurora PostgreSQL instance.

Analyze shared_buffers using the pg_buffercache extension

After analyzing the hit ratio, you can use the pg_buffercache extension to examine the contents of the shared buffer cache in real time. The pg_buffercache extension provides useful information about the buffer cache and helps you analyze the performance of a PostgreSQL database. pg_buffercache functions provide information about the number and size of buffers in the cache, the number of hits and misses, and the state of individual buffers. This information can be used to help diagnose performance problems and optimize database performance.

The pg_buffercache result set provides one row for every buffer in the shared cache. If a buffer is unused, all its fields will appear as null, except for the bufferid. A high number of cache misses can suggest that the shared_buffers value might be set too low, causing the database to engage in more disk I/O operations than required. If this is observed, adjust the shared_buffers value upward, which in turn can enlarge the buffer cache and decrease cache misses.

Let’s look at how to use pg_buffercache to analyze the buffer cache.

Connect to your Aurora PostgreSQL or RDS for PostgreSQL database using your preferred SQL client.
Run the following SQL statement to install the pg_buffercache extension:

CREATE EXTENSION pg_buffercache;

Run the following query to show the sorted relations based on the buffer cache used:

SELECT
c.relname, count(*) AS buffers
FROM
pg_buffercache b
INNER JOIN pg_class c ON b.relfilenode = pg_relation_filenode(c.oid)
AND
b.reldatabase IN (0, (SELECT oid FROM pg_database
WHERE
datname = current_database()))
GROUP BY c.relname
ORDER BY 2 DESC
LIMIT 10;

The following screenshot shows the output.

The preceding query provides the number of buffers used by each relation of the current database. Now, we run a sample workload on a sample database and analyze pg_buffercache.

Run the following query to identify the top 10 relations residing in shared_buffers and percentage utilization:

SELECT c.relname,
pg_size_pretty(count(*)*8192) AS buffer_size,
pg_size_pretty(pg_relation_size(c.oid)) as relation_size,
Round(100.0 * Count(*) / (SELECT setting
FROM pg_settings
WHERE name = ‘shared_buffers’) :: INTEGER, 2) AS buffers_percent,
ROUND(count(*)*8192*100/ pg_relation_size(c.oid)::numeric, 2 ) AS relation_percent,
CASE
WHEN c.relkind = ‘r’ THEN ‘table’
WHEN c.relkind = ‘i’ THEN ‘index’
WHEN c.relkind = ‘S’ THEN ‘sequence’
WHEN c.relkind = ‘t’ THEN ‘TOAST table’
WHEN c.relkind = ‘v’ THEN ‘view’
WHEN c.relkind = ‘m’ THEN ‘materialized view’
WHEN c.relkind = ‘c’ THEN ‘composite type’
WHEN c.relkind = ‘f’ THEN ‘foreign table’
WHEN c.relkind = ‘p’ THEN ‘partitioned table’
WHEN c.relkind = ‘I’ THEN ‘partitioned index’
ELSE ‘Unexpected relkind’
END as relation_type
FROM pg_class c
INNER JOIN pg_buffercache b
ON b.relfilenode = c.relfilenode
INNER JOIN pg_database d
ON ( b.reldatabase = d.oid
AND d.datname = Current_database() )
GROUP BY c.relname, c.oid
ORDER BY pg_total_relation_size(c.oid) DESC
LIMIT 10;

The output columns are as follows:

buffer_size – The size of buffer related to the relation
relation_size – The human-readable relation size
buffer_percentage – The percentage of shared_buffers occupied by the relation
relation_percentage – The percentage of relations that reside in shared_buffers
relation_type – The type of relation, such as table, index, or sequence

The following screenshot shows our output.

In this test scenario, shared_buffers is set at 1.95 GiB. The pgbench_accounts table is 139 GiB in size. 52.50% of shared_buffers is occupied by 1 GiB of this table. The rest of shared_buffers (.95 GiB) is used by other relations. If the working dataset of this table is larger than the cached data, to host more data for caching, shared_buffers should be modified to a higher value. While modifying the shared_buffers value, it’s imperative to monitor FreeableMemory, because other process-based memory allocations such work_mem and maintenance_work_mem also share memory from the same memory pool. If there isn’t sufficient space to increase the shared_buffers value, then it’s advisable to scale up the instance class to secure a larger total RAM allocation.

shared_buffers and pg_prewarm

For business-critical read workloads, having data preloaded in shared_buffers enhances performance. Amazon Aurora PostgreSQL and Amazon RDS for PostgreSQL support the pg_prewarm extension, which provides a convenient way to load relational data into the cache. In the preceding example, if you determine that pgbench_accounts_1 is a business-critical table and publishing a report is scheduled on the table, you can preload the data by running pg_prewarm for this table, as shown in the following code. Note that prewarming data pushes out the shared_buffers content based on an LRU basis, which can impact other read workloads.

SELECT pg_prewarm(‘pgbench_accounts_1’);

Conclusion

In this post, we explored the concept of shared_buffers and the pg_buffercache extension for Amazon Aurora PostgreSQL and Amazon RDS for PostgreSQL. The appropriate value for shared_buffers directly impacts the performance of PostgreSQL. Amazon Aurora PostgreSQL and Amazon RDS for PostgreSQL support the pg_buffercache extension, which provides a convenient way to get information about the buffer cache of a PostgreSQL database. CloudWatch metrics and Performance Insights counters help determine the workloads served by cached data. By understanding the state of the buffer cache, you can identify areas for performance optimization and fine-tune the configuration of PostgreSQL databases.

If you have any questions or suggestions about this post, leave a comment.

About the Authors

Kiran Singh is a Senior Partner Solutions Architect and an Amazon RDS and Amazon Aurora specialist at AWS focusing on relational databases. She helps customers and partners build highly optimized, scalable, and secure solutions; modernize their architectures; and migrate their database workloads to AWS.

Vivek Singh is a Principal Database Specialist Technical Account Manager with AWS focusing on Amazon RDS for PostgreSQL and Amazon Aurora PostgreSQL engines. He works with enterprise customers providing technical assistance on PostgreSQL operational performance and sharing database best practices. He has over 17 years of experience in open-source database solutions, and enjoys working with customers to help design, deploy, and optimize relational database workloads on AWS.

Read MoreAWS Database Blog

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Most Popular

Recent Comments