Tuesday, September 17, 2024
No menu items!
HomeDatabase ManagementGenerate Excel workbooks from Amazon RDS for PostgreSQL or Amazon Aurora PostgreSQL

Generate Excel workbooks from Amazon RDS for PostgreSQL or Amazon Aurora PostgreSQL

When using databases, you may need to move data into different systems, including other databases or other tools, to perform different analysis. One method to do this is through flat files, such as comma-separated value lists (CSVs), as these let you transfer data in a common format that can be loaded into other systems. PostgreSQL provides the “COPY” command to assist with the generation of flat files.

Some data extraction use cases may require additional steps. For example, Microsoft Excel, which is often used to perform additional analysis or charting on database data, has a feature that lets you keep data in multiple worksheets for other formatting. Data that is exported using the CSV format does not contain the additional info that simplifies visualizing data in Excel.

In this post, we walk you through a solution that provides a mechanism to generate Excel workbooks from Amazon Relational Database Service (Amazon RDS) for PostgreSQL or Amazon Aurora PostgreSQL-Compatible Edition databases. The solution comprises custom procedures and functions in the database, an AWS Lambda function built using Python, and an Amazon Simple Storage Service (Amazon S3) bucket to store the files. Note that the solution works for both Amazon RDS for PostgreSQL and Amazon Aurora PostgreSQL-compatible databases. We are demonstrating the solution for Amazon RDS for PostgreSQL database in this post.

Solution overview

The solution consists of the following components:

Amazon RDS for PostgreSQL
Amazon S3
AWS Lambda

The following diagram illustrates the architecture used in this post.

The high-level steps for the solution are as follows:

Database or application users run database procedures or functions to process the required data from a table or multiple tables based on the reporting data, and invoke the aws_s3 function to export the data as flat files to an S3 bucket. These files are in CSV format.
The aws_s3 functions invoke the Lambda function to process the CSV files in an S3 bucket and generate the Excel file as needed.
The Lambda function stores the Excel file generated in the same S3 bucket.
Database or application users can download the Excel reports to their local machines or implement scripts to store these reports in a centralized server.

Prerequisites

The following are the prerequisites before you begin to implement the solution:

An existing AWS account or create a new AWS account
An existing Amazon RDS for PostgreSQL database or create the database if you are creating a new AWS account.
Setup Amazon RDS for instance to access Amazon S3 bucket. You can follow the instructions in Setting up access to an Amazon S3 bucket or refer to the Create IAM policies and roles and Integrate Amazon S3 and Aurora PostgreSQL sections of Work with files in Amazon Aurora PostgreSQL and Amazon RDS for PostgreSQL.

Implement the solution

This section outlines the detailed steps required to implement the solution. The implementation is based on the scripts stored in the aws-aurora-rds-postgresql-excel-generator repository, which consists of the following files:

aws-aurora-rds-postgresql-excel-generator/sql/utl_file_utility.sql – This file contains the SQL procedures and functions (UTL_FILE) that help store the data in the S3 buckets
aws-aurora-rds-postgresql-excel-generator/sql/PostgreSQL_excel_generator_pkg.sql – This file contains the SQL procedures and functions to format the data in an Excel style
aws-aurora-rds-postgresql-excel-generator/sql/sample_proc_to_format_cells.sql – This file contains a sample procedure to process the data and generate CSV files
aws-aurora-rds-postgresql-excel-generator/Python/zip_rename_to_xlsx.py – This Python script processes the CSV files and generates the Excel sheets

The following are the detailed steps for implementation:

Amazon S3 and Lambda Setup:

Login to your AWS account and execute the following steps:

Create an S3 bucket and a base folder to store the Excel reports. For this post, we create the bucket pg-reports-testing and base folder excel_reports.
Create a Lambda function using the code in zip_rename_to_xlsx.py.

You can see the code after you create the Lambda function and upload the code as shown in the following screenshot.

You must add the following environment variables for your Lambda function:

BUCKET_NAME<your bucket name>
BASE_FOLDER<your base folder>

The following screenshot is a reference. For our example, the bucket name is pg-reports-testing and the base folder is excel_reports.

The next step is to provide database access to the Lambda function. For instructions, refer to Invoking an AWS Lambda function from an Aurora PostgreSQL DB cluster.

Create a policy using the code as shown in the following template updating the Resource attribute with your Lambda function’s Amazon Resource Name (ARN).

{
“Version”: “2012-10-17”,
“Statement”: [
{
“Sid”: “AllowAccessToExampleFunction”,
“Effect”: “Allow”,
“Action”: “lambda:InvokeFunction”,
“Resource”: “arn:aws:lambda:us-east-1:xxxx:function:CreateExcelReport”
}
]
}

A message appears when the policy has been created successfully.

Create a role for Amazon RDS for PostgreSQL or Aurora PostgreSQL by choosing RDS – Add Role to Database as the use case.

Attach the policy to the role as shown in the following screenshot.

Attach the role to the instance as shown in the following screenshot.

PostgreSQL Setup

Connect to your Amazon RDS for PostgreSQL instance and execute the following steps. You must run this command as a rds_superuser:

Connect to your database and add the aws_s3 extension.

psql -h pg-instance.xxxx.us-east-1.rds.amazonaws.com -p 5432 -U postgres -d excel_test

excel_test=> create extension aws_s3 CASCADE;
NOTICE: installing required extension “aws_commons”
CREATE EXTENSION
excel_test=> dx aws_*
List of installed extensions
Name | Version | Schema | Description
————-+———+——–+———————————————
aws_commons | 1.2 | public | Common data types across AWS services
aws_s3 | 1.1 | public | AWS S3 extension for importing data from S3
(2 rows)

Note: An excel_test database was created for the solution in the post.

Install the UTL_FILE utility by running the script sql/utl_file_utility.sql in the RDS for PostgreSQL or Aurora PostgreSQL database.

Note that the AWS Region is in the init() function of the script. If you want a different AWS Region, you can change it in the function prior to running the script.

psql -h pg-instance.xxxx.us-east-1.rds.amazonaws.com -p 5432 -U postgres -d excel_test -f sql/utl_file_utility.sql > utl.log 2>&1 &

excel_test=> dn utl_file_utility
List of schemas
Name | Owner
——————+———-
utl_file_utility | postgres
(1 row)
excel_test=> df utl_file_utility.*
List of functions
Schema | Name | Result data type | Argument data types | Type
——————+————-+—————————-+——————————————————————————————————————————————-+——
utl_file_utility | fclose | numeric | p_path character varying, p_file_name character varying | func
utl_file_utility | fclose_csv | numeric | p_path character varying, p_file_name character varying | func
utl_file_utility | fclose_xlsx | numeric | p_path character varying, p_file_name character varying | func
utl_file_utility | fgetattr | bigint | p_path character varying, p_file_name character varying | func
utl_file_utility | fopen | utl_file_utility.file_type | p_path character varying, p_file_name character varying, p_mode character DEFAULT ‘W’::bpchar, OUT p_file_type utl_file_utility.file_type | func
utl_file_utility | get_line | text | p_path character varying, p_file_name character varying, p_buffer text | func
utl_file_utility | init | void | | func
utl_file_utility | is_open | boolean | p_path character varying, p_file_name character varying | func
utl_file_utility | put_line | boolean | p_path character varying, p_file_name character varying, p_line text, p_flag character DEFAULT ‘W’::bpchar | func

Update the bucket and base folder details by inserting a record in the table utl_file_utility.all_directories:

INSERT INTO utl_file_utility.all_directories VALUES (nextval(‘utl_file_utility.all_directories_id_seq’), ‘REPORTS’, ‘pg-reports-testing’, ‘excel_reports’);

Note that:

REPORTS is a directory name that you use as an input while generating reports
pg-reports-testing is the S3 bucket name
excel_reports is a directory inside the S3 buckets to store the CSV files.

excel_test=> SELECT * FROM utl_file_utility.all_directories;
id | directory_name | s3_bucket | s3_path
—-+—————-+——————–+—————
1 | REPORTS | pg-reports-testing | excel_reports
(1 row)
excel_test=>

Note that REPORTS is a directory name that you use as an input while generating reports. Basically, it is categorization of the report. If you have multiple category reports which you save in different buckets or paths, you can insert multiple rows accordingly.

Install the Lambda extension using the following commands:

excel_test=> CREATE EXTENSION IF NOT EXISTS aws_lambda CASCADE;
CREATE EXTENSION
excel_test=> dx aws_la*
List of installed extensions
Name | Version | Schema | Description
————+———+——–+————————
aws_lambda | 1.0 | public | AWS Lambda integration
(1 row)
excel_test=>

Install PGEXCEL_BUILDER_PKG by running the script sql/PostgreSQL_excel_generator_pkg.sql in the Amazon RDS for PostgreSQL or Aurora PostgreSQL database:

$ psql -h pg-instance.xxxx.us-east-1.rds.amazonaws.com -p 5432 -U postgres -d excel_test -f sql/PostgreSQL_excel_generator_pkg.sql > excel_procs.log 2>&1 &

excel_test=> dn pgexcel*
List of schemas
Name | Owner
——————-+———-
pgexcel_generator | postgres
(1 row)

Create a sample procedure to generate an Excel report using the script sql/sample_proc_to_format_cells.sql

$ psql -h pg-instance.xxxx.us-east-1.rds.amazonaws.com -p 5432 -U postgres -d excel_test -f sql/sample_proc_to_format_cells.sql

excel_test=# x
Expanded display is on.
excel_test=> df sample_proc_to_format_cells
List of functions
-[ RECORD 1 ]——-+—————————————————————————————————————————————————————————————————————————————————————
Schema | public
Name | sample_proc_to_format_cells
Result data type |
Argument data types | p_directory character varying, p_report_name character varying, p_lambdafunctionname character varying, INOUT p_workbook pgexcel_generator.tp_book DEFAULT NULL::pgexcel_generator.tp_book, INOUT preturnval character varying DEFAULT NULL::character varying
Type | proc

Create a sample table to export data to a workbook sheet:

create table test_excel(id int,name varchar, eid varchar, mid varchar, nid int);

insert into test_excel values(generate_series(1,100),’name’||generate_series(1,100),’eid’||generate_series(1,100),’mid’||generate_series(1,100),generate_series(1,100));

Run the procedure sample_proc_to_format_cells:

call sample_proc_to_format_cells(‘REPORTS’,’excel_generate_test.xlsx’,’arn:aws:lambda:us-east-1:xxxxx:function:CreateExcelReport1′);

Note that you have to use your Lambda function ARN as an input.

Verify the Excel report generated in the S3 bucket.

The following screenshots show examples of the different sheets within the Excel workbook.

Cleanup

Cleanup the resources created to the solution implementation once you have tested the solution.

Delete the Amazon S3 bucket
Delete the Lambda function
Remove the IAM roles attached to Amazon RDS PostgreSQL instance
Remove the Policies and IAM roles
Drop the “excel_test” database

Summary

In this post, we demonstrated how you can generate the reports using the aws-aurora-rds-postgresql-excel-generator tool. While this post provided a procedure to generate a simple report, you can use the Excel functions to have the format and the number of sheets based on your requirements.

If you have any questions or suggestions about this post, leave a comment.

About the authors

Baji Shaik is a Database Consultant with Amazon Web Services. His background spans a wide depth and breadth of expertise and experience in SQL/NoSQL database technologies. He is a Database Migration Expert and has developed many successful database solutions addressing challenging business requirements for moving databases from on-premises to Amazon RDS and Aurora PostgreSQL/MySQL. He is an eminent author, having written several books on PostgreSQL. A few of his recent works include PostgreSQL Configuration, Beginning PostgreSQL on the Cloud and PostgreSQL Development Essentials. Furthermore, he has delivered several conference and workshop sessions.

Anuradha Chintha is a Lead Consultant with Amazon Web Services. She works with customers to build scalable, highly available, and secure solutions in the AWS Cloud. Her focus area is homogeneous and heterogeneous database migrations.

Uma Agarwal is an Engagement Manager at Amazon Web Services.

Read MoreAWS Database Blog

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Most Popular

Recent Comments