Wednesday, July 28, 2021
No menu items!
HomeDatabase ManagementSchedule jobs with pg_cron on your Amazon RDS for PostgreSQL or Amazon...

Schedule jobs with pg_cron on your Amazon RDS for PostgreSQL or Amazon Aurora for PostgreSQL databases

Scheduling jobs is an important part of every database environment. On a traditional on-premises database, you can schedule database maintenance jobs on the operating system where the database runs. When you migrate your databases to Amazon Relational Database Service (Amazon RDS) or Amazon Aurora, you lose the ability to log in to the host and schedule cron jobs. You can use services such as Amazon Elastic Compute Cloud (Amazon EC2), AWS Lambda, AWS Batch, Amazon CloudWatch, and Amazon EventBridge to schedule the maintenance tasks. However, you may want to work with pg_cron because this is an extension for this specific use case.

Starting with PostgreSQL version 12.5 and higher, Amazon RDS for PostgreSQL now supports the extension pg_cron, a simple cron-based job scheduler for PostgreSQL that runs inside the database. The pg_cron extension is also supported with Amazon Aurora PostgreSQL-Compatible Edition databases with versions 12.6 and higher.

You have a variety of ways to schedule jobs on AWS. The following table compares scheduling jobs on EC2 instances, Lambda, or pg_cron:

Amazon EC2
AWS Lambda
pg_cron
Instance provisioning
Instance should be provisioned all the time where cron jobs are scheduled
Serverless; no separate instance needed to schedule cron jobs
No separate instance is needed to schedule the jobs
DB authentication
Need to authenticate to database
Need to authenticate to database
Runs within the database, no additional authentication needed but requires permissions from rds_superuser to manage jobs
Runtime
No runtime limitation
Maximum runtime is 15 minutes
No runtime limitation
Metrics/Logging
Logs are stored on the EC2 instances
Logs are stored in CloudWatch logs
Logs are stored within the database in the table cron.job_run_details

In this post, we demonstrate how to use pg_cron along with some use cases of pg_cron with examples to show how to automate routine maintenance tasks.

Prerequisites

Before you get started, complete the following prerequisites:

Create or have access to an AWS account.
Have an RDS for PostgreSQL instance (version 12.5 or higher) or an Aurora PostgreSQL instance (version 12.6 or higher). For instructions on provisioning an instance, see Create and Connect to a PostgreSQL Database with Amazon RDS.
Have an EC2 instance to access the database instance with PostgreSQL installed and configured.
Because pg_cron is an extension that is created in the Postgres database, you must be connected to the database to create and manage the cron jobs.
To enable the pg_cron extension, modify the database parameter group and add the value pg_cron to the shared_preload_libraries parameter, then restart the database.

Create the extension pg_cron using command:

create extension pg_cron;

Job scheduling and unscheduling using pg_cron

Use the following commands to schedule and unschedule the jobs:

SELECT cron.schedule (‘job_name’,’schedule’,’command’);
SELECT cron.unschedule (job_id);
SELECT cron.unschedule (job_name);

The jobs are scheduled using the standard cron syntax (* * * * *), which is a set of five values in a line, indicating when the job should be run:

┌───────────── min (0 – 59)
│ ┌────────────── hour (0 – 23)
│ │ ┌─────────────── day of month (1 – 31)
│ │ │ ┌──────────────── month (1 – 12)
│ │ │ │ ┌───────────────── day of week (0 – 6) (0 to 6 are Sunday to
│ │ │ │ │                                                      Saturday, or use names; 7 is also Sunday)
│ │ │ │ │
│ │ │ │ │
*  *  *  *  *

Use cases for pg_cron

You can use pg_cron in a variety of use cases, such as scheduling vacuum, scheduling database refreshes, rebuilding indexes, and analyzing tables to update statistics. You can also manage partitions to handle the large volume of time series data using pg_cron and pg_partman extensions. In this post, we have covered some of the use cases of pg_cron with examples on how to automate these tasks.

Schedule vacuum and vacuum analyze

We highly recommend having autovacuum enabled in your databases. However, you may need to manually vacuum a database or a table at a specific time. You can schedule a manual daily vacuum job using the following command:

postgres=> SELECT cron.schedule(‘daily_manual_vacuum’, ’00 22 * * *’, ‘VACUUM’);

schedule
———-
34
(1 row)

The database users that run the cron jobs are granted permissions to the objects in the cron schema. The jobs fail and don’t run if the user that scheduled the cron jobs is deleted from the database.

By default, the jobs are scheduled in the database postgres. You can also schedule jobs on tables that reside in a different database. The following code is an example of scheduling a vacuum analyze on a table pgbench_history in the database mytestdb:

postgres=> SELECT cron.schedule(‘pgbench_history_analyze’, ’00 22 * * *’, ‘VACUUM ANALYZE pgbench_history’);

schedule
———-
35
(1 row)

postgres=> UPDATE cron.job SET database = ‘mytestdb’ WHERE jobid = 35;
UPDATE 1
postgres=>

Reindexing

Reindex command rebuilds one or more indexes, replacing the previous version of index. For more information about scenarios in which to use the reindex command, see REINDEX in the PostgreSQL documentation. With pg_cron, you can schedule a job to rebuild the indexes frequently for the required tables or schemas:

SELECT cron.schedule(‘reindex_pgbench_tellers’, ’30 04 * * *’, ‘REINDEX TABLE pgbench_tellers’);

Effectively manage partitions

Some organizations need to store and maintain large amounts of time series data such as weather records or patient health evolution metrics. Time series data could also be collecting metrics like events, clicks, application performance monitoring, and more. With the introduction of the extension pg_partman, you can now use pg_cron to automate the creation and maintenance of table partitions. For more information about effectively managing the time series data using pg_cron and pg_partman extensions, see Designing high-performance time series data tables on Amazon RDS for PostgreSQL.

Invoke a function

Another use case of pg_cron is that you can routinely invoke a Postgres function. The following example creates a test table, inserts sample records, and creates a function to insert the data in the table:

postgres=> Create table test_table(n int ,n1 date);
CREATE TABLE
postgres=>
postgres=> insert into test_table values (1,’2021-03-23′);
INSERT 0 1
postgres=> insert into test_table values (2,’2021-03-23′);
INSERT 0 1
postgres=> insert into test_table values (3,’2021-03-23′);
INSERT 0 1
postgres=>
postgres=> select * from test_table;
n | n1
—+————
1 | 2021-03-23
2 | 2021-03-23
3 | 2021-03-23
(3 rows)
postgres=> Create or replace function test_function(n int) returns int as
postgres→ $$Begin
postgres$> Insert into test_table values (n,’2021-03-23′);
postgres$> Return 1;
postgres$> End;
postgres$> $$
postgres-> Language ‘plpgsql’;
CREATE FUNCTION
postgres=>

To schedule the function call using pg_cron, use the following command:

select cron.schedule(‘*/5 * * * *’,’select test_function(10)’);
select cron.schedule(‘*/5 * * * *’,’select test_function(11)’);
select cron.schedule(‘*/5 * * * *’,’select test_function(12)’);

pg_cron parameters

You can change various parameters to change the behavior of pg_cron, such as cron.database_name, cron.log_run, cron.log_statement, cron.max_running_jobs, and cron.use_background_workers.

Except for cron.use_background_workers, all the other parameters can be modified by editing the database instance parameter group and restarting the database.

Monitor jobs with pg_cron

The table cron.job contains metadata of the jobs that are scheduled in the database, such as in the following example code:

select * from cron.job;

jobid | schedule | command | nodename | nodeport | database | username | active | jobname
——-+————-+————————————————————+———–+———-+———-+———-+——–+————————-
30 | 30 04 * * * | REINDEX TABLE pgbench_tellers | localhost | 5432 | mytestdb | postgres | t | reindex_pgbench_tellers
31 | */5 * * * * | select test_function(10) | localhost | 5432 | postgres | postgres | t |
34 | 00 22 * * * | VACUUM | localhost | 5432 | postgres | postgres | t | daily_manual_vacuum
35 | 00 22 * * * | VACUUM ANALYZE pgbench_history | localhost | 5432 | mytestdb | postgres | t | pgbench_history_analyze

(4 rows)

AWS also contributed to pg_cron by adding support for the audit table cron.job_run_details. This table is helpful for getting historic information about scheduled jobs, and it provides details like the start and end time of a job, the job status, and any error message returned by the job. See the following example code:

select * from cron.job_run_details;

jobid | runid | job_pid | database | username | command | status | return_message | start_time | end_time
——-+——-+———+———-+———-+——————————–+———–+—————-+——————————+——————————-
30 | 42486 | 27260 | mytestdb | postgres | REINDEX TABLE pgbench_tellers | succeeded | REINDEX | 2021-04-11 04:30:00.125779+00 | 2021-04-11 04:30:00.143258+00
31 | 591 | 20522 | postgres | postgres | select test_function(10) | succeeded | SELECT 1 | 2021-04-11 22:10:00.02484+00 | 2021-04-11 22:10:00.048407+00
34 | 11870 | 5255 | postgres | postgres | VACUUM | succeeded | VACUUM | 2021-04-11 22:00:00.232327+00 | 2021-04-11 22:00:00.300282+00
35 | 42251 | 19680 | mytestdb | postgres | VACUUM ANALYZE pgbench_history | succeeded | VACUUM | 2021-04-10 22:00:00.16846+00 | 2021-04-10 22:00:00.309454+00

(4 rows)

Clear the audit logs

The job run detail logs table can grow very quickly. As a best practice, consider scheduling a job to clean up the logs according to your organization’s log retention policy. In this example, we schedule a job to delete all the records that are older than 30 days:

postgres=> SELECT cron.schedule(’00 15 * * *’, $$DELETE FROM cron.job_run_details where database=’mytestdb’ and end_time < now()- interval’30 days’$$);

schedule
———-
36
(1 row)

Upgrade the pg_cron extension

The release of newer versions of postgres extensions is independent of the PostgreSQL database new releases. Upgrading the PostgreSQL database doesn’t upgrade the extensions. If a new version of a postgres extension is available, you can see it in the view pg_available_extension_versions or you can refer to the list of supported extensions with Amazon Aurora PostgreSQL. For Amazon RDS PostgreSQL, you can see this list of supported extensions.

To upgrade an extension to a newer supported version, see Upgrading Postgres extensions.

What happens to the pg_cron jobs when you take a snapshot and restore an instance?

The pg_cron jobs are included in RDS instance snapshots. They’re copied to any new RDS instance created from the snapshot. If you’re assigning a new database parameter group to the restored RDS instance, make sure that the parameter shared_preload_libraries has the value pg_cron added to it. pg_cron won’t run if this value is missing in the parameter shared_preload_libraries.

If you’re restoring the database in a different time zone, you should adjust the schedule of the jobs on the target instance accordingly. For example, if you have a pg_cron job that is scheduled at 8:00 PM in the UTC time zone and you take a snapshot of your instance and restore it in the time zone UTC-4, the jobs run on the restored instance at 8:00 PM in the UTC-4 time zone. We recommend that you review your job schedules after restoring the instance in a new time zone.

pg_cron with multi-AZ and read replicas

If you’re running RDS for PostgreSQL in a multi-AZ configuration, there’s no impact to the scheduled pg_cron jobs following a failover. The jobs continue to run as usual.

The jobs don’t run on the read replica instances. If you promote a read replica, the jobs are copied and run with the same schedule on the newly promoted primary instance. You can delete or change the schedules on the new primary instance.

Limitations

You can’t use pg_cron jobs to schedule a job in seconds interval. For example, you can’t schedule a cron job to run every 10 seconds. For such scenarios, it’s better to write a shell script that uses the sleep command.

Summary

This post explains how to schedule and monitor jobs on your RDS for PostgreSQL or Aurora PostgreSQL databases using pg_cron. Additionally, this post also demonstrates the different use cases of pg_cron along with the examples. To learn more about pg_cron, see Scheduling maintenance with the PostgreSQL pg_cron extension.

For any questions or suggestions about this post, feel free to leave a comment.

About the Author

Sukhpreet Kaur Bedi is a Database Specialist Solutions Architect with AWS focusing on Amazon RDS/Aurora PostgreSQL engines. She helps customers innovate on the AWS platform by building highly available, scalable, and secure database architectures.

 

Read MoreAWS Database Blog

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Most Popular

Recent Comments