Thursday, March 28, 2024
No menu items!
HomeDatabase ManagementMigrate Oracle hierarchical queries to Amazon Aurora PostgreSQL

Migrate Oracle hierarchical queries to Amazon Aurora PostgreSQL

We have seen a number of organizations are migrating their database workloads from commercial database engines to the Amazon Aurora database environment. These organizations have reduced their overall efforts on common database administration tasks, data center maintenance, and have moved away from proprietary database features and commercial licenses.

AWS provides the AWS Schema Conversion Tool (AWS SCT) which simplifies schema conversion for heterogeneous database migration. AWS SCT reports objects and SQL requiring manual efforts for conversion. Hierarchical queries conversion requires additional efforts to convert, validate, and test in comparison to objects automatically converted by AWS SCT.

In this post, we show you how to migrate different hierarchical queries from Oracle to Amazon Aurora PostgreSQL-Compatible Edition using recursive queries in common table expressions (CTE). We also look at a few limitations of the tablefunc extension supported by Aurora PostgreSQL.

Hierarchical queries

In Oracle, hierarchical queries are used to query data that has a parent-child relationship where each child can have only one parent, whereas a parent can have multiple children. This is very useful when trying to build reporting queries such as product lineage, manager reports, and a family tree. A hierarchical query displays organized rows in a tree structure, so in order to retrieve the data, it has to be traversed starting from the root. The following diagram illustrates a sample tree structure.

In the above diagram, the first node at the top of the hierarchy, A1, is called the root node, and the rest of the nodes such as B1, B2, and B3 are the children nodes. If a query needs to find the hierarchy of node D1, it will scan the tree from A1 to B3 and then to C2, traversing down to D1.

Prerequisites

We use the following sample table and data throughout our examples in this post.

Oracle
PostgreSQL

create table hier_test(
emp_no number,
ename varchar2(5),
job varchar2(9),
manager_no number
);
insert into hier_test
values(10,’A1′,’CEO’,null);
insert into hier_test
values(11, ‘B1’, ‘VP HARDWARE’, 10);
insert into hier_test
values(12, ‘B2’, ‘VP ADMIN’, 10);
insert into hier_test
values(13, ‘B3’, ‘VP DEVELOPMENT’, 10);
insert into hier_test
values(14, ‘C1’, ‘DIRECTOR DEVELOPMENT’, 13);
insert into hier_test
values(15, ‘C2’, ‘DIRECTOR DEVELOPMENT’, 13);
insert into hier_test
values(16, ‘D1’, ‘MANAGER DEVELOPMENT’, 15);
insert into hier_test
values(17 ,’E1′, ‘ENGINEER HARDWARE’, 11);
insert into hier_test
values(18, ‘E2’, ‘ENGINEER HARDWARE’, 11);
create table hier_test(
emp_no int,
ename varchar(5),
job varchar(9),
manager_no int
);
insert into hier_test
values(10,’A1′,’CEO’,null);
insert into hier_test
values(11, ‘B1’, ‘VP HARDWARE’, 10);
insert into hier_test
values(12, ‘B2’, ‘VP ADMIN’, 10);
insert into hier_test
values(13, ‘B3’, ‘VP DEVELOPMENT’, 10);
insert into hier_test
values(14, ‘C1’, ‘DIRECTOR DEVELOPMENT’, 13);
insert into hier_test
values(15, ‘C2’, ‘DIRECTOR DEVELOPMENT’, 13);
insert into hier_test
values(16, ‘D1’, ‘MANAGER DEVELOPMENT’, 15);
insert into hier_test
values(17 ,’E1′, ‘ENGINEER HARDWARE’, 11);
insert into hier_test
values(18, ‘E2’, ‘ENGINEER HARDWARE’, 11);

Although PostgreSQL doesn’t have functions or predefined keywords to handle the hierarchical queries directly, you can define custom solutions with the help of the tablefunc extension and CTEs. Tablefunc is useful for hierarchical queries with connect by and level keywords, but with CTE we can support various types of keywords such as LEVEL, NOCYCLE, SYS_CONNECT_BY_PATH, ORDER SIBLINGS BY, CONNECT_BY_ISLEAF, and CONNECT_BY_ROOT of hierarchical queries. We are going to discuss and look at these scenarios in details in the following sections.

tablefunc extension

The tablefunc extension has a function called connectby, which produces a display of hierarchical data that is stored in a table.

For the connectby function to work, the table needs the following:

A key field that uniquely identifies rows
A parent-key field that references the parent (if any) of each row

This function can display the sub-tree descending from a row. The primary use case of this function is to display parent-child connections (hierarchy data).

The following code demonstrates our PostgreSQL query:

SELECT * FROM
Connectby(‘hier_test’, ‘emp_no’, ‘manager_no’, ‘10’, 0, ‘->’) AS t(emp_no int, manager_no int, level int, ord text)
order by emp_no;

The following screenshot shows our output.

The connectby function works best when only the parent, child, and level attributes are selected. For example, in the following query, fetching the ename attribute results in an error:

SELECT * FROM
Connectby(‘hier_test’, ’emp_no’, ‘manager_no’, ’10’, 0, ‘->’,’ename’) AS
t(emp_no int, manager_no int, level int, ord text,ename text)
order by emp_no;

We get the following output.

Additionally, connectby cannot be used with hierarchical queries built using keywords like NOCYCLE, CONNECT_BY_ISLEAF, SYS_CONNECT_BY_PATH, ORDER SIBLINGS BY, and CONNECT_BY_ROOT.

Recursive queries

We can achieve hierarchical queries using CTE recursive queries. A recursive query is one that is defined by a UNION ALL with an initialization fullselect that seeds the recursion. The iterative (recursive) fullselect contains a direct reference to itself in the FROM clause. See the following code:

WITH RECURSIVE <tab_name>(column_list)
AS
(
— seed query
Anchor query
UNION ALL
— Recursive member that references <tab_name>.
recursive_query
)
— references expression name
SELECT *
FROM <tab_name> ;

Let’s go through various use cases of Oracle hierarchical queries and achieve similar functionality using recursive SQL in PostgreSQL.

Scenario 1: Display employee level along with other details

The keyword Level describes the depth of node in the hierarchy. The first level is the root of hierarchy.

The following code shows our Oracle query:

SELECT emp_no,ename,job,level
FROM hier_test
CONNECT BY PRIOR emp_no = manager_no
START WITH manager_no IS NULL
order by level ;

The following screenshot shows our results.

In PostgreSQL, the non-recursive part generates the root of the hierarchy (top-down), which is the employee with no manager ( manager_no is null) or with a specific manager (manager_n = 10). The recursive part generates the hierarchy by joining the main table with the output of the non-recursive query until the join condition (e.manager_no = c.emp_no) is true. See the following PostgreSQL query:

WITH RECURSIVE cte AS (
SELECT emp_no, ename,job, manager_no, 1 AS level
FROM hier_test
where manager_no is null
UNION ALL
SELECT e.emp_no, e.ename, e.job,e.manager_no, c.level + 1
FROM cte c
JOIN hier_test e ON e.manager_no = c.emp_no
)
SELECT emp_no,ename,job,level
FROM cte
order by level;

We get the following output.

Scenario 2: Hierarchical queries with SYS_CONNECT_BY_PATH

The keyword SYS_CONNECT_BY_PATH returns the path of a column value from root to node, with column values separated by char(delimiter) for each row returned by the CONNECT BY condition.

Use the following Oracle query:

SELECT emp_no,ename,job,level,SYS_CONNECT_BY_PATH(ename,’;’)
FROM hier_test
CONNECT BY PRIOR emp_no = manager_no
START WITH manager_no is null
order by level ;

We get the following results.

In PostgreSQL, we can achieve a functionality similar to SYS_CONNECT_BY_PATH by concatenating the parent and child record attributes with a char/delimiter in every iteration. See the following code:

WITH RECURSIVE cte(emp_no, manager_no, ename,job, level, path)
AS (
SELECT emp_no, manager_no, ename, job,1 AS level,
‘;’||ename AS path
FROM hier_test
WHERE manager_no is null
UNION ALL
SELECT e.emp_no, e.manager_no, e.ename,e.job, c.level + 1 AS level,
c.path||’;’||e.ename AS path
FROM hier_test e, cte c
WHERE e.manager_no = c.emp_no
)
SELECT emp_no ,ename ,job,level,path
FROM cte
order by level,emp_no;

We get the following output.

Scenario 3: Hierarchical queries with NOCYCLE

The NOCYCLE parameter instructs Oracle Database to return rows from a query even if a CONNECT BY LOOP exists in the data.

If there is data in which the child is the parent and the parent is the child, then the hierarchical query goes into a data loop. The NOCYCLE keyword helps us avoid this loop.

To create a cycle in our data, we added a record of another employee as emp_no 13 with their manager as emp_no 14. With our new record, the sample data looks like the following screenshot.

When you run the SQL code from the previous scenario in Oracle without adding NOCYCLE, you encounter an error:

SELECT emp_no,ename,job,level,SYS_CONNECT_BY_PATH(ename,’;’)
FROM hier_test
CONNECT BY PRIOR emp_no = manager_no
START WITH manager_no is null
order by level ;

The following screenshot shows our output.

Adding the NOCYCLE keyword in the Oracle query gives you expected results. See the following code:

SELECT emp_no,ename,job,level,SYS_CONNECT_BY_PATH(ename,’;’)
FROM hier_test
CONNECT BY NOCYCLE PRIOR emp_no = manager_no
START WITH manager_no is null
order by level ;

We get the following results.

In PostgreSQL, we use two fields, route and cycle, to achieve functionality similar to NOCYCLE. route is an array of already visited values and cycle is a flag that gets set based on if a value is already present in route and the condition cycle = false filters out cyclic records. See the following code:

WITH RECURSIVE cte(emp_no, manager_no, ename,job, level, route,cycle, path)
AS (
SELECT emp_no, manager_no, ename, job,1 AS level, array[emp_no] AS route,false AS cycle,
‘;’||ename AS path
FROM hier_test
WHERE manager_no is null
UNION ALL
SELECT e.emp_no, e.manager_no, e.ename,e.job, c.level + 1 AS level,c.route || e.emp_no ,e.emp_no = ANY(c.route) as cycle,
c.path||’;’||e.ename AS path
FROM hier_test e, cte c
WHERE e.manager_no = c.emp_no AND cycle = false
)
SELECT emp_no,ename,job,level,path
FROM cte
WHERE cycle = false ;

The following screenshot shows our output.

Scenario 4: Hierarchical queries with ORDER SIBLINGS BY

The optional SIBLINGS keyword specifies an order that first sorts the parent rows, then sorts the child rows of each parent for each level within the hierarchy. See the following Oracle query:

SELECT emp_no,ename,job,manager_no,level
from hier_test
start with manager_no is null
CONNECT BY nocycle PRIOR emp_no = manager_no
order siblings by ename ;

The following screenshot shows our results.

In PostgreSQL, we can achieve a functionality similar to ORDER BY SIBLINGS by ordering the CTE output by path. The path is a concatenation of attributes mentioned in the ORDER BY clause in the Oracle query.

In the following PostgreSQL query, the path attribute has ename (emp_no is optionally included to handle scenarios of different emp_no values with the same ename under the same manager):

WITH RECURSIVE cte(emp_no, manager_no, ename,job, level, route,cycle, path)
AS (
SELECT
emp_no, manager_no, ename, job,1 AS level, array[emp_no] AS route,false AS cycle,
‘;’||ename||emp_no AS path
FROM hier_test
WHERE manager_no is null
UNION ALL
SELECT
e.emp_no, e.manager_no, e.ename,e.job, c.level + 1 AS level,c.route || e.emp_no ,e.emp_no = ANY(c.route) as cycle,
c.path||’;’||e.ename||e.emp_no AS path
FROM hier_test e, cte c
WHERE e.manager_no = c.emp_no AND cycle = false
)
SELECT
emp_no,ename,job,manager_no,level
FROM cte
WHERE cycle = false
ORDER BY path,emp_no ;

We get the following output.

Scenario 5: Hierarchical queries with CONNECT_BY_ISLEAF

The CONNECT_BY_ISLEAF pseudo column returns 1 if the current row is a leaf of the tree defined by the CONNECT BY condition. Otherwise it returns 0. This information indicates whether a given row can be further expanded to show more of the hierarchy.

Use the following Oracle query:

SELECT emp_no,ename,job,manager_no,level,SYS_CONNECT_BY_PATH (ename,’;’) PATH,CONNECT_BY_ISLEAF ISLEAF
from hier_test
start with manager_no is null
CONNECT BY nocycle PRIOR emp_no = manager_no
order siblings by job ;

We get the following results.

In PostgreSQL, we can achieve a functionality similar to CONNECT_BY_ISLEAF by checking whether the child node is a part of the parent nodes returned by the CTE or not. See the following code:

WITH RECURSIVE cte(emp_no, manager_no, ename,job, level, route,cycle, path)
AS (
SELECT
emp_no, manager_no, ename, job,1 AS level, array[emp_no] AS route,false AS cycle, ‘;’||ename AS path, ‘;’||job||emp_no AS ordersnblpath ,emp_no as root_id
FROM hier_test
WHERE manager_no is null
UNION ALL
SELECT
e.emp_no, e.manager_no, e.ename,e.job, c.level + 1 AS level,c.route || e.emp_no ,e.emp_no = ANY(c.route) as cycle, c.path||’;’||e.ename AS path,
c.ordersnblpath||’;’||e.job||e.emp_no AS ordersnblpath, c.root_id
FROM hier_test e, cte c
WHERE e.manager_no = c.emp_no AND cycle = false
)
SELECT
emp_no,ename,job,manager_no,level,path
, not exists (select * from cte p where p.manager_no = e.emp_no and cycle = false) as is_leaf
FROM cte e
WHERE cycle = false
ORDER BY ordersnblpath,emp_no ;

The following screenshot shows our output.

Scenario 6: Hierarchical queries with CONNECT_BY_ROOT

CONNECT_BY_ROOT is a unary operator that is valid only in hierarchical queries. When you qualify a column with this operator, Oracle returns the column value using data from the root row. See the following query:

SELECT emp_no,ename,job,manager_no,level,SYS_CONNECT_BY_PATH (ename,’;’) PATH,CONNECT_BY_ROOT ename
from hier_test
start with manager_no is null
CONNECT BY nocycle PRIOR emp_no = manager_no;

We get the following results.

In PostgreSQL, we can achieve functionality similar to CONNECT_BY_ROOT by using substr or a split function on the path attributed in the CTE output:

WITH RECURSIVE cte(emp_no, manager_no, ename,job,level, route,cycle, path)
AS (
SELECT
emp_no, manager_no, ename, job,1 AS level, array[emp_no] AS route,false AS cycle,
‘;’||ename AS path ,emp_no as root_id
FROM hier_test
WHERE manager_no is null
UNION ALL
SELECT
e.emp_no, e.manager_no, e.ename,e.job, c.level + 1 AS level,c.route || e.emp_no ,e.emp_no = ANY(c.route) as cycle,
c.path||’;’||e.ename AS path, c.root_id
FROM hier_test e, cte c
WHERE e.manager_no = c.emp_no AND cycle = false
)
SELECT
emp_no,ename,job,manager_no,level,path
,SPLIT_PART(path,’;’,2) root
FROM cte e
WHERE cycle = false
ORDER BY path, emp_no ;

We get the following output.

Considerations

Consider the following when using this solution in your environment:

The dataset we used for this post is simple in nature and may not reflect the data complexity of your environment
Not all scenarios or keywords of Oracle’s hierarchical queries are discussed in this post
Test the solution and queries you’re going to build referencing the sample queries for functional and performance requirements

Conclusion

In this post, we demonstrated via sample queries how you can migrate Oracle hierarchical queries using keywords LEVEL, NOCYCLE, SYS_CONNECT_BY_PATH, ORDER SIBLINGS BY, CONNECT_BY_ISLEAF, and CONNECT_BY_ROOT to PostgreSQL. We also talked about use cases and shortcomings of the tablefunc extension when migrating Oracle hierarchical queries.

Check out Database Migration—What Do You Need to Know Before You Start? to get started. Also review the recommended best practices, including the migration process and infrastructure considerations, source database considerations, and target database considerations for the PostgreSQL environment.

If you have any questions, comments, or other feedback, share your thoughts on the Amazon Aurora Discussion Forums.

About the Authors

Rakesh Raghav is a Database Specialist with the AWS Professional Services in India, helping customers have a successful cloud adoption and migration journey. He is passionate about building innovative solutions to accelerate their database journey to cloud.

Anuradha Chintha is a Lead Consultant with Amazon Web Services. She works with customers to build scalable, highly available, and secure solutions in the AWS Cloud. Her focus area is homogeneous and heterogeneous database migrations.

Read MoreAWS Database Blog

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Most Popular

Recent Comments