Tuesday, September 17, 2024
No menu items!
HomeDatabase ManagementIdentify PostgreSQL semantic issues with the plpgsql_check extension

Identify PostgreSQL semantic issues with the plpgsql_check extension

One critical aspect during the migration process from Oracle to PostgreSQL is identifying any semantic issues that may arise when translating Oracle-specific code to PostgreSQL. Although Oracle performs both semantic and syntax checks during code compilation, PostgreSQL focuses solely on syntax checks during code compilation, leaving room for potential semantic discrepancies in the converted code base. This could result in issues during functional testing and application runtime. To address this issue, database developers working with PL/pgSQL code in PostgreSQL can use the plpgsql_check extension.

In this post, we explain how you can use the plpgsql_check extension in PostgreSQL to catch potential semantic errors and elevate your overall code quality.

Solution overview

The plpgsql_check extension provides a set of tools to perform semantic checks on your PL/pgSQL code. plpgsql_check is most useful for developers who are working with large or complex SQL code bases, because it can help identify and resolve issues early in the development cycle. This extension helps in identifying a wide range of issues, including syntax errors, undefined variables, unused variables, and other potential bugs. It provides a detailed list of all identified issues, along with relevant information about the line of code where the issue was detected and a clear description of the problem.

The plpgsql_check extension has rich set of usage patterns, the details of which can be found in the GitHub repository. This post dives deep into the following usage patterns:

Functions and procedures for semantic checks in PostgreSQL
Triggers for semantic checks in PostgreSQL
Dependency list
Performance warnings

Amazon RDS for PostgreSQL or Amazon Aurora PostgreSQL support many extensions for the PostgreSQL database engine. For more details, refer to Extension versions for Amazon RDS for PostgreSQL and Extension versions for Amazon Aurora PostgreSQL.

However, this extension is not yet supported by Amazon Relational Database Service (Amazon RDS) for PostgreSQL or Amazon Aurora PostgreSQL-Compatible Edition. For this post, we use standard PostgreSQL installed on an Amazon Elastic Compute Cloud (Amazon EC2) machine and explain how plpgsql_check can help developers identify and correct errors before running the code. If you’re migrating to Amazon RDS for PostgreSQL or Amazon Aurora PostgreSQL, you can run a pg_dump operation and employ pg_restore to transfer the metadata to the EC2 instance.

Prerequisites

To get started with the solution described in this post, you should have the following prerequisites:

An active AWS account
A PostgreSQL database installed on Amazon EC2
The plpgsql_check extension installed as per the instructions in the GitHub repository

Challenges with semantic checks in PostgreSQL

Although Oracle performs both semantic and syntax checks during PL/SQL code compilation, PostgreSQL focuses solely on syntax checks. This behavior can lead to the following unforeseen semantic issues during functional testing when migrating from Oracle to PostgreSQL:

Semantic issues exist within the code and don’t arise during the compilation, but are reported when you run the PL/pgSQL code
When there are multiple semantic issues, not all issues are reported when you run the PL/pgSQL code
Semantic issues might not get detected even while running the function because the error conditions may not arise

Let’s understand these issues through the following examples.

Example 1
The following example uses the PL/pgSQL function calculate_area, which calculates the area of a square where users supply the length through the function parameter:

CREATE OR REPLACE FUNCTION calculate_area(p_length NUMERIC)
RETURNS INTEGER AS $$
DECLARE
v_dt1      DATE;
BEGIN
SELECT current_date INTO v_dt1 FROM DUAL;  /* Bug1 – table DUAL does not exist in PostgreSQL */

IF p_length <= 0 THEN
RETURN 0;
END IF;

RETURN p_length*p_width;    /* Bug2 – variable p_width is not declared within the function scope */
END; $$
LANGUAGE plpgsql;

We get the following output:

CREATE FUNCTION

Clearly, the function has two bugs:

The DUAL table does not exist in PostgreSQL by default
The variable p_width is not declared within the function scope but it’s getting accessed

These types of semantic issues don’t arise during the compilation of the function in PostgreSQL. However, they may appear when the function is run. Therefore, to identify and address these semantic issues within your PL/pgSQL code, it requires additional efforts to run the function and iteratively fix errors one by one:

postgres=# SELECT calculate_area(4.5);
ERROR: relation “dual” does not exist
LINE 1: SELECT current_date FROM DUAL
^
QUERY: SELECT current_date FROM DUAL
CONTEXT: PL/pgSQL function calculate_area(numeric) line 5 at SQL statement
Once you’ve addressed the previous error by removing the reference to
DUAL from the statement
SELECT current_date INTO v_dt1 FROM DUAL, replacing it with
SELECT current_date INTO v_dt1, you can proceed to test the function for any additional errors.
postgres=# SELECT calculate_area(4.5);
ERROR: column “p_width” does not exist
LINE 1: p_length*p_width
^
QUERY: p_length*p_width
CONTEXT: PL/pgSQL function calculate_area(numeric) line 11 at RETURN

Example 2

Sometimes, you might not detect the bugs even while running the function, because the error conditions may not arise. To better understand this, let’s consider an example.

Create a table t1 with two columns, a and b:

postgres=# CREATE TABLE t1(
a int,
b int
);

We get the following output:

CREATE TABLE

Let’s create a function that loops through the table data and does certain processing:

CREATE OR REPLACE FUNCTION f2()
RETURNS void
LANGUAGE plpgsql
AS
$$
DECLARE
rec record;
BEGIN
FOR rec IN SELECT * FROM t1
LOOP
RAISE NOTICE ‘%’, rec.c; /* Bug1: Columns ”c” does not exist in table “t” */
END LOOP;
END;
$$;

We get the following output:

CREATE FUNCTION

Now run the function:

/* No semantic issues found as table “t” is empty and control does not go inside the loop */
postgres=# SELECT f2();
f2
—-

(1 row)

This example illustrates how manually converting and testing functions could increase the likelihood of errors, potentially reducing the overall code quality.

Common semantic issues that plpgsql_check can identify

The plpgsql_check extension can identify various semantic issues in the PL/pgSQL code that may cause problems, such as the following:

Undefined variables – If you reference a variable that hasn’t been declared or is out of scope, plpgsql_check will flag it as an error.
Unused variables – It can highlight variables that are declared but never used within the function body.
Type mismatches – It helps identify type mismatches that might lead to runtime errors. For example, calling a procedure with a mismatch between parameter types. The function return type mismatches between the declared variable and the expected type.
Control flow problems – It identifies potential logic errors in conditional statements, loops, or branches:

Incorrect use of transactions, such as mismatches between BEGIN and COMMIT statements.
Incorrect use of exceptions, especially when there is no exception block for a raised exception within the same block.

Incorrect function calls – It detects when a function is called with the wrong number or type of arguments or when a dependent object doesn’t exist in the database.
Trigger-related issues – For PL/pgSQL triggers, it helps find problems related to OLD/NEW record usage, trigger control, and data type mismatches.
Incorrect use of SQL statements – It identifies issues like SELECT statement and INTO clause mismatches in terms of column counts.
Incorrect use of subqueries – It detects improper usage of subqueries, such as missing or extra columns in subqueries or missing aliases.
Incorrect use of cursors – It identifies incorrect cursor usage, like failing to open or close cursors properly.
Incorrect use of arrays – It detects issues with array usage, such as data type mismatches between array elements and table columns.

Important semantic check functions

The plpgsql_check extension provides a set of functions that allow you to check the syntax and semantics of PL/pgSQL code. The following are the main functions:

plpgsql_check_function – Checks the syntax and semantics of a specific PL/pgSQL function and returns the result as text
plpgsql_check_function_tb – Returns a human-readable tabular result of the check performed by plpgsql_check_function

Let’s dive into these functions and illustrate their usage through examples.

Create the following table to understand how the two functions work with a few PL/pgSQL code examples:

/* Create a table named as “bigtable” */
CREATE TABLE bigtable(
id   integer,
name text,
constraint bigtable_pk primary key(id)
);

/* Inserting sample data into “bigtable” */
INSERT INTO bigtable
SELECT i, ‘name-‘||i
FROM generate_series(1, 1000) i;

/* Sample Function that reads data from the “bigtable” based on the “id” passed */
CREATE OR REPLACE FUNCTION example01(_id numeric)
RETURNS numeric AS $$
DECLARE
r record;
s numeric DEFAULT 0;
BEGIN
FOR r IN SELECT * FROM bigtable WHERE id = _id
LOOP
s := s + r.student_name;
END LOOP;
END;
$$ LANGUAGE plpgsql;

In this example, we intentionally include two errors:

A wrong reference to the identifier r.student_name—there is no such column student_name in bigtable
The RETURN statement is missing

Although you have two semantic issues, the preceding function code will still be compiled with no errors. However, you may encounter runtime errors while running the function.

In your first run, pass an input to the function with value 1000:

SELECT example01(1000);

The command fails with an error: the control reached the end of the function without RETURN. This is because the value 1000 exists in bigtable and the row is processed but could not find the RETURN keyword to return the output value.

In a second run, change the input parameter to 1001:

SELECT example01(1001);

The command fails with a different error as following: record r has no field called student_name.

Now let’s understand how you can reveal the bugs through plpgsql_check_function and plpgsql_check_function_tb without even needing to run the code.

Semantic checks using plpgsql_check_function

This function analyzes a single PL/pgSQL function. It takes a function name or function OID as an argument and returns a table of issues found, with columns such as lineno, position, and message to identify the location and nature of each issue.

Check the syntax and semantics of the function discussed in the previous section using plpgsql_check_function:

SELECT * FROM plpgsql_check_function(‘example01(numeric)’);

In this case, the error message points to the line where the error occurred. However, it’s noteworthy that only one error is retrieved by default unless optional arguments are configured to obtain all errors (we discuss this in more detail in subsequent examples).

Semantic checks using plpgsql_check_function_tb

The plpgsql_check_function_tb function closely resembles plpgsql_check_function, except that it returns results in a tabular format with details like the line number of the function where the issue is, the error message, and other helpful information. This makes it straightforward for developers to understand errors and fix them quickly. The function is particularly useful for creating reports that list issues and provide a lot of information for careful examination and solving problems. See the following example code:

SELECT * FROM plpgsql_check_function_tb(‘example01(numeric)’);

We get the following output.

Semantic checks for triggers in PostgreSQL

Triggers within PostgreSQL are linked to a designated table, view, or foreign table, as well as to a function named your_trigger_function_name, which runs specific business operations outlined within it. The trigger function must be defined prior to creating the trigger itself. It must be declared as a function that takes no arguments and returning type trigger. This function is run each time the trigger is activated by any triggering event.

Challenges for semantic checks for triggers

Let’s create a sample table, a trigger, and the trigger function to understand how you can check semantic issues within a trigger function:

/* Create sample table */
CREATE TABLE employees(
id          INTEGER,
first_name VARCHAR(40) NOT NULL,
last_name  VARCHAR(40) NOT NULL,
CONSTRAINT employees_pk PRIMARY KEY(id)
);

INSERT INTO employees VALUES(1, ‘David’, ‘Mellon’);
INSERT INTO employees VALUES(2, ‘John’, ‘Kett’);
INSERT INTO employees VALUES(3, ‘Anjello’, ‘Pie’);

CREATE TABLE audit_employees_deleted(
id          INTEGER,
first_name VARCHAR(40) NOT NULL,
last_name  VARCHAR(40) NOT NULL,
CONSTRAINT audit_employees_deleted_pk PRIMARY KEY(id)
);

Create a trigger function that inserts a record in the audit_employees_deleted table for every employee deleted from the employees table:

CREATE OR REPLACE FUNCTION fnc_trg_delete_employee()
RETURNS TRIGGER AS $$
BEGIN
INSERT INTO audit_employees_deleted
VALUES (NEW.*);
END;
$$ LANGUAGE plpgsql;

We get the following output:

CREATE FUNCTION

In the preceding trigger function, there are two bugs:

The RETURN keyword is missing
DELETE triggers can’t refer to NEW

However, the compilation is successful, and because trigger functions are a special type of functions, you can’t test simply by calling them.

postgres=# SELECT fnc_trg_delete_employee();
ERROR:  trigger functions can only be called as triggers
CONTEXT:  compilation of PL/pgSQL function “fnc_trg_delete_employee” near line 1

Therefore, you need to create the trigger that calls the trigger function when it’s activated. Let’s create a trigger that is activated when you delete any record from the employees table and calls the trg_fnc_delete_employee() function to perform an insert operation in the audit_employees_deleted table:

CREATE TRIGGER trg_delete_employee
BEFORE DELETE
ON employees
FOR EACH ROW
EXECUTE FUNCTION fnc_trg_delete_employee();

We get the following output:

CREATE TRIGGER

Test manually to see the semantic issue

As explained previously, triggers are activated when the triggering event occurs. In the following example, the triggering event is DELETE, to delete an employee from the employees table:

DELETE FROM employees
WHERE id = 2;
ERROR:  null value in column “id” of relation “audit_employees_deleted” violates not-null constraint
DETAIL:  Failing row contains (null, null, null).
CONTEXT:  SQL statement “INSERT INTO audit_employees_deleted VALUES (NEW.*)”
PL/pgSQL function fnc_trg_delete_employee() line 3 at SQL statement

Semantic checks for triggers using plpgsql_check_function_tb

You can also use the plpgsql_check extension to check if a specified trigger is syntactically and semantically bug free. Because a trigger function and the trigger are tightly coupled and associated with a table, you need to specify the table name along with the trigger function name to plpgsql_check_function or plpgsql_check_function_tb to find the semantic issues. See the following code:

/* Error when trigger relation is not provided as argument */
postgres=# SELECT * FROM plpgsql_check_function_tb(‘ fnc_trg_delete_employee()’);

We get the following output.

See the following code for correct trigger semantic checking:

/* Produces the list of semantic issues in a trigger */
postgres=# SELECT * FROM plpgsql_check_function_tb(‘ fnc_trg_delete_employee()’, ’employees’);

We get the following output.

Find semantic errors and warnings using optional arguments

Both plpgsql_check_function and plpgsql_check_function_tb don’t reveal all semantic errors by default. You need to pass some optional arguments to get the full list of semantic errors within your PL/pgSQL code. Let’s explore some commonly used optional arguments. You can get the full list of optional arguments supported by these two functions in the GitHub repo.

fatal_errors

In the previous examples, only one error is reported by plpgsql_check_function and plpgsql_check_function_tb. To retrieve all errors in a PL/pgSQL function, you can use the fatal_errors optional parameter, which is of the BOOLEAN type. The default value of fatal_errors is TRUE, indicating that the function will stop on the first error to prevent massive error reports.

The syntax is as follows:

fatal_errors boolean DEFAULT true

Let’s test the plpgsql_check_function_tb function with the optional argument:

SELECT * FROM plpgsql_check_function_tb(‘example01’, fatal_errors=>false);

We get the following output:

With the fatal_errors optional argument and plpgsql_check_function_tb function, developers can generate all unknown bugs within the code and fix all of them in the development phase, thereby increasing code quality. This also helps error propagation to the next phase like functionality testing or integration testing.

Let’s fix all the issues reported within the function and compile once again to check if the plpgsql_check_function_tb function reports any other bugs:

CREATE OR REPLACE FUNCTION example01(_id numeric)
RETURNS numeric
LANGUAGE plpgsql
AS $function$
DECLARE
r record;
s numeric DEFAULT 0;
BEGIN
FOR r IN SELECT * FROM bigtable WHERE id = _id
LOOP
s := s + r.name;
END LOOP;

RETURN s;

END;
$function$;

We get the following output:

CREATE FUNCTION

After the fixed function is compiled, you can check it again to see that there are no additional errors:

SELECT * FROM plpgsql_check_function_tb(‘example01’, fatal_errors=>false);

We get the following output:

There are no more code semantic issues within the function. However, there could be some performance improvement areas that developers might overlook while migrating code from Oracle or writing any PL/pgSQL functions or procedures in PostgreSQL. The plpgsql_check extension can also help you identify some performance improvement indicators. Let’s explore how you can get performance improvement recommendations from plpgsql_check.

performance_warnings

This optional argument aids in identifying performance-related warnings, such as data type casting issues and implicit casts in the WHERE clause, which can be the reason why indexes are not used and many other performance improvement indicators.

The syntax is as follows:

performance_warnings boolean DEFAULT false

Let’s explore the functionality of the performance_warnings optional argument to identify some potential performance bottlenecks and recommendations generated through the plpgsql_check_function_tb function:

SELECT * FROM plpgsql_check_function_tb(‘example01’, performance_warnings => true);

We get the following output:

From the preceding screenshot, you can see the performance improvement recommendations:

Type casting can be dangerous sometimes when fetching data from the table. This is because the table data fetch could go to a sequence scan instead of an index scan when the table column data type doesn’t match the data provided. In our code, there is the statement SELECT * FROM bigtable WHERE id = _id, where the table id column data type is integer and the value passed from the argument is numeric type. Because of this, this step could be costly because the index scan is skipped.
PostgreSQL functions by default are created with the volatility classification set as VOLATILE unless otherwise specified. This is an important attribute because it can greatly increase performance when used correctly. For more information, refer to Function Volatility Categories. Because this function performs only read operations in the form of SELECT statements, it can be marked as STABLE to have better performance.

Let’s fix all performance improvement recommendations and test the function again:

CREATE OR REPLACE FUNCTION example01(_id integer)
RETURNS numeric
LANGUAGE plpgsql
STABLE
AS $function$
DECLARE
r record;
s numeric DEFAULT 0;
BEGIN
FOR r IN SELECT * FROM bigtable WHERE id = _id
LOOP
s := s + r.id;
END LOOP;

RETURN s;
END;
$function$;

We get the following output:

CREATE FUNCTION

After the fixed function is compiled, you can check it again to see that there are no additional errors:

SELECT * FROM plpgsql_check_function_tb(‘example01’, fatal_errors=>false);

We get the following output:

Output formats

The function plpgsql_check_function has three possible output formats: text, json, or xml. You can always transform the output format based on your application needs, as shown in the following example, where we format the output as a JSON document:

select * from plpgsql_check_function(‘example01(numeric)’, format:=’json’);

These are some of the commonly used optional arguments. For  the full list of optional arguments supported by plpgsql_check_function and plpgsql_check_function_tb, refer to Optional arguments.

Additional utility queries for semantic checks on all code objects

In this section, we present additional utility queries designed to facilitate semantic checks on all code objects within a PostgreSQL database. The following queries use the plpgsql_check_function function to comprehensively verify non-trigger PL/pgSQL functions, trigger PL/pgSQL functions, and all PL/pgSQL functions and procedures.

Check all non-trigger plpgsql user created functions:

SELECT p.oid, p.proname, plpgsql_check_function(p.oid,fatal_errors := false)
FROM pg_catalog.pg_namespace n
JOIN pg_catalog.pg_proc p ON pronamespace = n.oid
JOIN pg_catalog.pg_language l ON p.prolang = l.oid
WHERE l.lanname = ‘plpgsql’ AND p.prorettype <> 2279;

Check all trigger plpgsql functions:

SELECT p.oid, p.proname, plpgsql_check_function(p.oid,fatal_errors := false)
FROM pg_catalog.pg_namespace n
JOIN pg_catalog.pg_proc p ON pronamespace = n.oid
JOIN pg_catalog.pg_language l ON p.prolang = l.oid
WHERE l.lanname = ‘plpgsql’ AND p.prorettype <> 2279;

Check all plpgsql functions (functions or trigger functions with defined triggers):

SELECT (pcf).functionid::regprocedure, (pcf).lineno, (pcf).statement,
(pcf).sqlstate, (pcf).message, (pcf).detail, (pcf).hint, (pcf).level,
(pcf).”position”, (pcf).query, (pcf).context
FROM(SELECT
plpgsql_check_function_tb(pg_proc.oid, COALESCE(pg_trigger.tgrelid, 0),fatal_errors := false) AS pcf
FROM pg_proc
LEFT JOIN pg_trigger
ON (pg_trigger.tgfoid = pg_proc.oid)
WHERE
prolang = (SELECT lang.oid FROM pg_language lang WHERE lang.lanname = ‘plpgsql’) AND
pronamespace <> (SELECT nsp.oid FROM pg_namespace nsp WHERE nsp.nspname = ‘pg_catalog’) AND
— ignore unused triggers
(pg_proc.prorettype <> (SELECT typ.oid FROM pg_type typ WHERE typ.typname = ‘trigger’) OR
pg_trigger.tgfoid IS NOT NULL)
) ss ORDER BY (pcf).functionid::regprocedure::text, (pcf).lineno;

Limitations

When working with static code, plpgsql_check is expected to catch practically all issues. However, as with any tool, it has certain limitations:

False positives and negatives – Like any static analysis tool, plpgsql_check may generate false positives (flagging code as problematic when it’s not) or false negatives (not catching certain issues).
Dynamic SQL – plpgsql_check performs static analysis of PL/pgSQL code, meaning that it examines the code without actually running it. Code blocks that use dynamic SQL can be challenging to analyze without running them. plpgsql_check may not fully understand the dynamic SQL and might not be able to perform accurate analysis in such cases.
Schema changes – The extension checks the code within the context of the current schema. If there are dependencies on other schema database objects, plpgsql_check might not catch them accurately.
Temporary tables – plpgsql_check can’t analyze queries in conjunction with temporary tables that are created in the plpgsql function during runtime.
Version compatibility – As PostgreSQL evolves, new language features and syntax may be introduced, and plpgsql_check might not always support the latest features.
Refcursors – plpgsql_check can’t be used to detect the structure of referenced cursors because it’s dynamic in nature.

Conclusion

In this post, we described the use of the plpgsql_check extension to identify common mistakes in the early stages of the development process, leading to more reliable and efficient code. Despite its limitations, plpgsql_check remains an essential tool for pinpointing potential issues in PL/pgSQL code. This approach is recommended for developers who are working on large or complex SQL code bases, and offers the following benefits:

Catching errors early and saving time – The plpgsql_check extension helps developers detect errors early in the development process, thereby reducing the time and effort needed for debugging and fixing problems
Improving Code Quality – By examining both the syntax and semantics of the PL/pgSQL code, the plpgsql_check extension helps developers write better code that is more reliable and semantically bug-free (considering the limitations listed in the previous section)

Feel free to leave a comment if you have any questions.

About the Authors

Sashikanta Pattanayak works as a Lead Consultant with the Professional Services team at AWS, specializing in database modernization, optimization, and migration strategies. His role involves collaborating with clients to architect scalable, resilient, and secure solutions within the AWS Cloud environment. He specializes in homogeneous and heterogeneous database migrations.

Bikash Chandra Rout is a Lead Database Consultant with the Professional Services team at Amazon Web Services. Bikash focuses on helping customers build highly available, cost-effective database solutions and migrate their commercial engines to the AWS Cloud. He is passionate about databases and analytics.

Jitender Kumar is a Senior Lead Consultant with the Professional Services team at Amazon Web Services. He has worked on multiple databases as DBA and is skilled in SQL performance tuning and database migrations. He focuses on database migration to AWS and works with customers from assessment to implementation.

Prashant Borse is a Lead Consultant with the Professional Services team at Amazon Web Services. He has worked on multiple migration projects and is skilled in SQL performance tuning and database optimizations.

Read MoreAWS Database Blog

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Most Popular

Recent Comments