Monday, October 14, 2024
No menu items!
HomeDatabase ManagementTransform column content and data type using AWS DMS

Transform column content and data type using AWS DMS

AWS Database Migration Service (AWS DMS) is a managed service that you can use to migrate data from various source data engines to supported targets. When migrating data from an Oracle source to PostgreSQL-compatible target, you often encounter special data type transformation scenarios that require using the rich set of transformation rules AWS DMS provides. One such example is the Oracle RAW data type which you can use to store Globally Unique Identifier (GUID) data to the Universally Unique Identifier (UUID) data type in PostgreSQL, because by default AWS DMS converts the Oracle RAW data type to BYTEA in PostgreSQL.

In this post, we show you two methods to perform data type conversion with AWS DMS: using materialized views and using expressions in your transformation rules.

Solution overview

AWS DMS allows you to use SQLite operators and SQLite functions within a transformation rule to compute column values that you can apply to any selected schema, table, or view. In addition, AWS DMS supports expressions, which means using SQLite operators or functions to define the data in a column. In this post, we provide examples of how to use SQLite operators and SQLite functions in transformation rules to convert Oracle GUID in RAW data type columns to UUID data type format on PostgreSQL.

We describe two methods in this post: using materialized views and using expressions in your transformation rules, which work for other data types or similar use cases. We provide CloudFormation template and scripts that can be deployed to test the solutions.

Method 1: Materialized view approach

This involves using materialized view to first convert GUID data to UUID format on the Oracle source itself, then using an AWS DMS task to load the data directly from the materialized view to the PostgreSQL target.

Method 2: Expressions in AWS DMS table-mapping approach

We use SQLite operators like Concatenate, SQLite functions like substring (SUBSTR), hexadecimal functions (HEX), and lowercase (LOWER) functions to break the GUID data into several groups separated by hyphens, specifically a group of 8 digits followed by three groups of 4 digits followed by a group of 12 digits, for a total of 32 digits representing the 128 bits.

The common workaround for such edge conversion requirements is to first transform the data at the source using a view, then create an AWS DMS task to load the data as presented by the view created at the source. Unlike materialized views, which are disk based (data is stored in a table and updated periodically based on the query definition), using the view approach is limited to full load only and doesn’t provide the ongoing data replication option needed for AWS DMS continuous data replication to the target database.

Prerequisites

The following are the prerequisites for this example:

Access to an AWS account with IAM policy
Familiarity with the following AWS services:
AWS CloudFormation
AWS Command Line Interface (AWS CLI)
AWS DMS
IAM
Amazon Relational Database Service (Amazon RDS) for Oracle
Amazon Relational Database Service (Amazon RDS) for PostgreSQL
Amazon Virtual Private Cloud (Amazon VPC)

Familiarity with Oracle SQL*Plus client.
Familiarity with PostgreSQL psql client.

Set up the environment

The following CloudFormation template provisions an Amazon VPC with an RDS for Oracle instance under the License Included (LI) model, an RDS for PostgreSQL instance, an Amazon Elastic Compute Cloud (Amazon EC2) instance, AWS DMS replication instance, AWS DMS endpoints, and AWS DMS replication tasks for the examples discussed in this post.

You can use your existing Oracle database or use your existing RDS for Oracle Bring Your Own License (BYOL) instance. With BYOL you must have the appropriate Oracle Database license (with Software Update License and Support) for the DB instance class and Oracle Database edition you wish to run. You must also follow Oracle’s policies for licensing Oracle Database software in the cloud computing environment. For more information, see Oracle licensing options.

Create a key pair if you don’t have one already.

Save the file; you need it to connect to the Amazon EC2 host in later steps.

Download the CloudFormation template from the GitHub repository.
On the AWS CloudFormation console, create a stack with the CloudFormation template dms-template.yaml
For Stack name, enter a name.
For DBPassword, enter a password. Remember this password for database connection.
For Ec2HostKeyPair, enter an existing key pair.
Accept the remaining defaults.

Acknowledge AWS CloudFormation might create IAM resources with customer names and choose Create stack. The deployment takes 20-30 minutes to complete.

Ensure the CloudFormation stack completed successfully by checking the AWS CloudFormation console.

On the Databases page of the Amazon RDS console, you can confirm that both the RDS for Oracle instance and RDS for PostgreSQL instance are created and available.

On the AWS DMS console, you can check that both AWS DMS endpoints are created and shows a successful status.

Configure Oracle database

In order for AWS DMS to capture ongoing transactions from the Oracle database, complete the following steps:

Connect to the EC2 host deployed by the CloudFormation stack:

$ ssh -o ServerAliveInterval=5 -i <keypair-file>.pem ec2-user@<ec2-instance-ipaddr>

Connect to the Oracle source database using the alias ora-src already configured in .bash_profile:

$ sqlplus sports@sports-ora-instance.xxxxxxxxxxxx.us-west-2.rds.amazonaws.com:1521/sportdb

Enable minimal supplemental logging:

— Check Minimal Supplemental logging is enabled
COLUMN log_mode HEADING ‘Log|Mode’ FORMAT A12
COLUMN log_min HEADING ‘Minimum|Supplemental|Logging?’ FORMAT A12
COLUMN log_pk HEADING ‘Primary Key|Supplemental|Logging?’ FORMAT A12
COLUMN log_fk HEADING ‘Foreign Key|Supplemental|Logging?’ FORMAT A12
COLUMN log_ui HEADING ‘Unique|Supplemental|Logging?’ FORMAT A12
COLUMN log_all HEADING ‘All Columns|Supplemental|Logging?’ FORMAT A12

SELECT log_mode
,supplemental_log_data_min log_min
,supplemental_log_data_pk log_pk
,supplemental_log_data_fk log_fk
,supplemental_log_data_ui log_ui
,supplemental_log_data_all log_all
FROM v$database;

— Enable minimal supplemental logging
EXEC rdsadmin.rdsadmin_util.alter_supplemental_logging(‘ADD’);

Grant the required privileges to the user (sports) used in the AWS DMS Oracle source endpoint connection.

For this post, you can use the file create-dbuser-dms-rds-oracle.sql. In this example, the sports user already exists, so the script just grants the privileges. If you use a different user other than sports in this example, make the same changes in the AWS DMS Oracle source endpoint.

Migrate the Oracle GUID type to UUID type on PostgreSQL

When migrating from Oracle to PostgreSQL, you might need to migrate Oracle GUID to UUID format on PostgreSQL. In Oracle, the SYS_GUID function generates and returns a globally unique identifier (RAW value) made up of 16 bytes (for example, D2F86F59C2C32687E0530100007F0FD4). On most platforms, the generated identifier consists of a host identifier, a process or thread identifier of the process or thread invoking the function, and a non repeating value (sequence of bytes) for that process or thread. The UUID data type stores UUIDs and is written as a sequence of lowercase hexadecimal digits, in several groups separated by hyphens, specifically a group of 8 digits followed by three groups of 4 digits followed by a group of 12 digits, for a total of 32 digits representing the 128 bits (for example, d2f86f59-c2c3-2687-e053-0100007f0fd4).

Method 1: Materialized view with fast a refresh on commit

This approach involves using materialized views to first convert GUID data to UUID format on the Oracle source itself, then using an AWS DMS task to load the data directly from the materialized view to the PostgreSQL target. This method has the advantage of not requiring any additional steps to transform data between AWS DMS and the PostgreSQL target. However, there are some disadvantages: It requires changes to the Oracle source database, and it requires the creation of additional objects within the Oracle source database.

To use this method, complete the following steps:

Create the TEST_TBL_1 table on the Oracle source and insert sample records:

— Create table TEST_TBL_1
CREATE TABLE sports.test_tbl_1
(
t_col1 VARCHAR2(10),
t_col2 RAW(16) DEFAULT SYS_GUID(),
t_col3 VARCHAR2(64) DEFAULT ‘Available’,
CONSTRAINT test_tbl_1_t_col1_pk PRIMARY KEY (t_col1)
);

INSERT INTO sports.test_tbl_1 (t_col1) VALUES (1000);
INSERT INTO sports.test_tbl_1 (t_col1) VALUES (1001);
COMMIT;

COL t_col2 FORMAT a40
COL t_col3 FORMAT a20
SELECT * FROM sports.test_tbl_1;

Create a materialized view log on SPORTS.TEST_TBL_1 (required for fast refresh), create the materialized view SPORTS.TEST_TBL_1_MVW (converting GUID data into UUID format), and enable supplemental logging on the materialized view:

— Create materialized view log on sports.test_tbl_1 table with PRIMARY KEY
CREATE MATERIALIZED VIEW LOG ON sports.test_tbl_1 WITH PRIMARY KEY, ROWID;

— Create materialized view sports.test_tbl_1_mvw and convert GUID value to UUID format
CREATE MATERIALIZED VIEW sports.test_tbl_1_mvw (t_col1, t_col2, t_col3)
LOGGING
NOCOMPRESS
NOCACHE
NOPARALLEL
REFRESH FAST
ON COMMIT
WITH PRIMARY KEY
DISABLE QUERY REWRITE
AS
SELECT t_col1
,REGEXP_REPLACE(RAWTOHEX(t_col2),'([A-F0-9]{8})([A-F0-9]{4})([A-F0-9]{4})([A-F0-9]{4})([A-F0-9]{12})’, ‘1-2-3-4-5’) AS t_col2
,t_col3
FROM sports.test_tbl_1;

COL t_col2 FORMAT a40
COL t_col3 FORMAT a20
SELECT * FROM sports.test_tbl_1_mvw;

The column T_COL2 is now VARCHAR2 and the values are now in UUID format for migration.

Enable supplemental logging based on the primary key for the materialized view table SPORTS.TEST_TBL_1_MVW:

— Enable supplemental logging on the materialized view sports.test_tbl_1_mvw
ALTER TABLE sports.test_tbl_1_mvw ADD SUPPLEMENTAL LOG DATA (PRIMARY KEY) COLUMNS;

Connect to the PostgreSQL target using the pgs-tgt alias already configured in .bash_profile:

$ psql -h sports-pgs-instance.xxxxxxxxxxxx.us-west-2.rds.amazonaws.com -p 5432 -U sports -d sportdb

In order to migrate the data in the column T_COL2 to UUID on the PostgreSQL target, pre-create the target table with column T_COL2 as UUID type

— Pre-create the target table with UUID data type
CREATE SCHEMA sports;
CREATE EXTENSION “uuid-ossp”;
CREATE TABLE sports.test_tbl_1
(
t_col1 VARCHAR(10) NOT NULL,
t_col2 UUID NOT NULL DEFAULT uuid_generate_v4(),
t_col3 VARCHAR(64) NOT NULL DEFAULT ‘Available’::character varying,
CONSTRAINT test_tbl_1_t_col1_pk PRIMARY KEY (t_col1)
);

SELECT * FROM sports.test_tbl_1;

Next, we start the DMS task sports-ora2pgs-guid-uuid-using-mviews with mapping rules to load from the Oracle materialized view (TEST_TBL_1_MVW) to the PostgreSQL target table (TEST_TABL_1). The task is created with full load and ongoing replication, and TRUNCATE_BEFORE_LOAD for table preparation because we already created the target table.

On the AWS DMS console, start the task sports-ora2pgs-guid-uuid-using-mviews.

The CloudFormation stack used the following mapping rules when creating the task:

DMS task settingstask-settings.json
DMS table-mappingtable-mapping-guid-uuid-using-mviews.json

See the following code:

In this scenario we only need to convert the name of the source Materialized View (TEST_TBL_1_VW) to the corresponding target table (test_tbl_1). That’s what the following code highlighted in red does.

{
“rules”: [
{
“rule-type”: “transformation”,
“rule-id”: “1”,
“rule-name”: “1”,
“rule-target”: “schema”,
“object-locator”: {
“schema-name”: “%”
},
“rule-action”: “convert-lowercase”
},
{
“rule-type”: “transformation”,
“rule-id”: “2”,
“rule-name”: “2”,
“rule-target”: “table”,
“object-locator”: {
“schema-name”: “%”,
“table-name”: “%”
},
“rule-action”: “convert-lowercase”
},
{
“rule-type”: “transformation”,
“rule-id”: “3”,
“rule-name”: “3”,
“rule-target”: “column”,
“object-locator”: {
“schema-name”: “%”,
“table-name”: “%”,
“column-name”: “%”
},
“rule-action”: “convert-lowercase”
},
{
“rule-type”: “selection”,
“rule-id”: “4”,
“rule-name”: “4”,
“object-locator”: {
“schema-name”: “SPORTS”,
“table-name”: “TEST_TBL_1_VW“,
“table-type”: “table
},
“rule-action”: “include”
},
{
“rule-type”: “transformation”,
“rule-id”: “5”,
“rule-name”: “5”,
“rule-action”: “rename“,
“rule-target”: “table“,
“object-locator”: {
“schema-name”: “SPORTS”,
“table-name”: “TEST_TBL_1_VW
},
“value”: “test_tbl_1
}
]
}

Connect to the PostgreSQL target and check if the first two records were loaded by the AWS DMS task:

SELECT * FROM sports.test_tbl_1;

You can now test the replication by running some inserts, updates, or deletes into SPORTS.TEST_TBL_1 in Oracle source database:

— Performing some DML operations on the TEST_TBL_1 table on Oracle
INSERT INTO sports.test_tbl_1 (t_col1) VALUES (1002);
INSERT INTO sports.test_tbl_1 (t_col1) VALUES (1003);
UPDATE sports.test_tbl_1 SET t_col3 = ‘Unavailable’ WHERE t_col1 = 1000;
COMMIT;

Check if the materialized view is also updated:

— Compare Master table SPORTS.TEST_TBL_1 and Materialized View SPORTS.TEST_TBL_1_MVW have same records on Oracle
SELECT * FROM sports.test_tbl_1;
SELECT * FROM sports.test_tbl_1_mvw ORDER BY t_col1;

Check the PostgreSQL target if the AWS DMS task successfully replicated these changes in the source to the target table on PostgreSQL:

SELECT * FROM sports.test_tbl_1 ODER BY t_col1;

In summary, this approach has the advantage of not requiring additional steps to transform data between AWS DMS and the PostgreSQL target. However, it requires changes to the Oracle source database and the creation of additional objects within the Oracle source database.

Method 2: Expressions in AWS DMS table mapping rules

This approach involves using SQLite functions in the AWS DMS task table mapping rules to transform the GUID data loaded from Oracle to UUID format before inserting it into the PostgreSQL target table.

The advantage of this method is that no new objects are needed on the Oracle source database to transform the data between AWS DMS and the PostgreSQL target. It requires familiarity with SQLite functions. Additionally, as of this writing, it requires additional post-migration steps on the PostgreSQL target.

To use this method, complete the following steps:

Create the TEST_TBL_2 table on the Oracle source and insert sample records:

— Create table TEST_TBL_2
CREATE TABLE sports.test_tbl_2
(
t_col1 VARCHAR2(10),
t_col2 RAW(16) DEFAULT SYS_GUID(),
t_col3 VARCHAR2(64) DEFAULT ‘Available’,
CONSTRAINT test_tbl_2_t_col1_pk PRIMARY KEY (t_col1)
);

INSERT INTO sports.test_tbl_2 (t_col1, t_col2) VALUES (1000, SYS_GUID());
INSERT INTO sports.test_tbl_2 (t_col1, t_col2) VALUES (1001, SYS_GUID());
INSERT INTO sports.test_tbl_2 (t_col1, t_col2) VALUES (1002, SYS_GUID());
COMMIT;

COL t_col2 FORMAT a40
COL t_col3 FORMAT a20
SELECT * FROM sports.test_tbl_2;

Enable supplemental logging for the table TEST_TBL_2:

— Enabling supplemental logging on TEST_TBL_2 table
ALTER TABLE sports.test_tbl_2 ADD SUPPLEMENTAL LOG DATA (PRIMARY KEY) COLUMNS;

Next, we create the table on the target PostgreSQL with UUID data type. As of this writing, AWS DMS doesn’t support the update-column action, so for this post we use a two-step approach. First, we use the add-column action to add a new column T_COL2_TMP. We then use T_COL2_TMP in the table mapping rules to store the UUID formatted value. When migration is complete, we then drop the original T_COL2 and rename T_COL2_TMP to T_COL2.

Connect to the target RDS for PostgreSQL instance using the alias pgs-tgt already configured in .bash_profile:

Create the target table. For this post, we add column T_COL2_TMP, which stores the converted UUID values from the Oracle source columns T_COL2:

— Pre-create the target table with UUID data type for column T_COL2_TMP and keep T_COL2 as BYTEA data type (AWS DMS by default map Oracle RAW data type to BYTEA on PostgreSQL)

CREATE TABLE sports.test_tbl_2
(
t_col1 varchar(10) NOT NULL,
t_col2_tmp uuid NOT NULL DEFAULT uuid_generate_v4(),
t_col3 varchar(64) NOT NULL DEFAULT ‘Available’::character varying,
t_col2 bytea,
CONSTRAINT test_tbl_2_t_col1_pk PRIMARY KEY (t_col1)
);

SELECT * FROM sports.test_tbl_2;

The CloudFormation stack already created the AWS DMS task sports-ora2pgs-guid-uuid-using-expression. AWS DMS supports expressions, which means using SQLite operators or functions to define the data in a column. In rule-id 2, we use the SQLite substring (SUBSTR), hexadecimal function (HEX), and lowercase (LOWER) functions to break the GUID data into several groups separated by hyphens, specifically a group of 8 digits followed by three groups of 4 digits followed by a group of 12 digits, for a total of 32 digits representing the 128 bits.

The CloudFormation stack used the following mapping rules when creating the task:

DMS task settingstask-settings.json
DMS table-mappingtable-mapping-guid-uuid-using-expression.json.
Create the rules with the following code:

{
“rules”: [
{
“rule-type”: “selection”,
“rule-id”: “1”,
“rule-name”: “1”,
“object-locator”: {
“schema-name”: “SPORTS”,
“table-name”: “TEST_TBL_2”
},
“rule-action”: “include”
},
{
“rule-type”: “transformation”,
“rule-id”: “2”,
“rule-name”: “2”,
“rule-action”: “add-column”,
“rule-target”: “column”,
“object-locator”: {
“schema-name”: “SPORTS”,
“table-name”: “TEST_TBL_2”
},
“value”: “t_col2_tmp”,
“expression”: “CASE LOWER(SUBSTR(HEX($T_COL2), 1, 8) || ‘-‘ || SUBSTR(HEX($T_COL2), 9, 4) || ‘-‘ || SUBSTR(HEX($T_COL2), 13, 4) || ‘-‘ || SUBSTR(HEX($T_COL2), 17, 4) || ‘-‘ || SUBSTR(HEX($T_COL2), 21, 12)) WHEN ‘—-‘ THEN NULL ELSE LOWER(SUBSTR(HEX($T_COL2), 1, 8) || ‘-‘ || SUBSTR(HEX($T_COL2), 9, 4) || ‘-‘ || SUBSTR(HEX($T_COL2), 13, 4) || ‘-‘ || SUBSTR(HEX($T_COL2), 17, 4) || ‘-‘ || SUBSTR(HEX($T_COL2), 21, 12)) END”,
“data-type”: {
“type”: “string”,
“length”: 60
}
},
{
“rule-type”: “transformation”,
“rule-id”: “3”,
“rule-name”: “3”,
“rule-target”: “schema”,
“object-locator”: {
“schema-name”: “%”
},
“rule-action”: “convert-lowercase”
},
{
“rule-type”: “transformation”,
“rule-id”: “4”,
“rule-name”: “4”,
“rule-target”: “table”,
“object-locator”: {
“schema-name”: “%”,
“table-name”: “%”
},
“rule-action”: “convert-lowercase”
},
{
“rule-type”: “transformation”,
“rule-id”: “5”,
“rule-name”: “5”,
“rule-target”: “column”,
“object-locator”: {
“schema-name”: “%”,
“table-name”: “%”,
“column-name”: “%”
},
“rule-action”: “convert-lowercase”
}
]
}

On the AWS DMS console, start the task sports-ora2pgs-guid-uuid-using-expression.

Check the PostgreSQL target to confirm the first three records were loaded by the AWS DMS task.

The original T_COL2 is loaded as BYTEA, which is the AWS DMS default mapping (Oracle RAW data type is mapped to BYTEA on PostgreSQL). However, column T_COL2_TMP is loaded with UUID format based on the expression used in the table mapping rule-id 2.

sportdb=> SELECT * FROM sports.test_tbl_2;
t_col1 | t_col2_tmp | t_col3 | t_col2
——–+————————————–+———–+————————————
1000 | d2f86f59-c2c8-2687-e053-0100007f0fd4 | Available | xd2f86f59c2c82687e0530100007f0fd4
1001 | d2f86f59-c2c9-2687-e053-0100007f0fd4 | Available | xd2f86f59c2c92687e0530100007f0fd4
1002 | d2f86f59-c2ca-2687-e053-0100007f0fd4 | Available | xd2f86f59c2ca2687e0530100007f0fd4
(3 rows)

You can now test the replication by running some inserts, updates, or deletes into SPORTS.TEST_TBL_2 in the Oracle source database:

— Performing some DML operations on the TEST_TBL_2 table on Oracle
INSERT INTO sports.test_tbl_2 (t_col1, t_col2) VALUES (1003, SYS_GUID());
INSERT INTO sports.test_tbl_2 (t_col1, t_col2) VALUES (1004, SYS_GUID());
INSERT INTO sports.test_tbl_2 (t_col1, t_col2) VALUES (1005, SYS_GUID());
INSERT INTO sports.test_tbl_2 (t_col1, t_col2) VALUES (1006, SYS_GUID());
INSERT INTO sports.test_tbl_2 (t_col1, t_col2) VALUES (1007, SYS_GUID());
UPDATE sports.test_tbl_2 SET t_col3 = ‘Unavailable’ WHERE t_col1 IN (1000,1001,1002);
COMMIT;

COL t_col2 FORMAT a40
COL t_col3 FORMAT a20
SELECT * FROM sports.test_tbl_2;

Check the PostgreSQL target to confirm the data is replicated:

SELECT * FROM sports.test_tbl_2 ORDER BY t_col1;

As part of the cutover strategy or when migration is completed, stop the AWS DMS task sports-ora2pgs-guid-uuid-using-expression via the AWS DMS console.

On the PostgreSQL target, after the AWS DMS task is stopped, you can drop T_COL2 and rename T_COL2_TMP back to T_COL2:

— At this stage, DMS task is stopped
ALTER TABLE sports.test_tbl_2 DROP column t_col2;
ALTER TABLE sports.test_tbl_2 RENAME COLUMN t_col2_tmp TO t_col2;
SELECT * FROM sports.test_tbl_2 ORDER BY t_col1;

In the future, when AWS DMS supports update-column, we can update existing columns without using add-column as a workaround option.

Clean up resources

Stop AWS DMS tasks and clean up by deleting the stack to avoid ongoing charges.

Stop the AWS DMS tasks after testing

Delete CloudFormation stack

Conclusion

In this post, we showed how you can use Oracle’s materialized views and AWS DMS transformations with SQLite functions to transform GUID data to UUID data type in PostgreSQL. You can adapt the solution in this post for other data type transformations when using AWS DMS for data migration in general.

If you have any comments or questions on this post, please share them in the comments.

About the Authors

Eli Doe is a Migration Specialist SA with Amazon Web Services. He works with customers and partners providing technical assistance and designing customer solutions on cloud migration projects, helping customers migrate and modernize their existing databases to the AWS Cloud.

Suvendu Kumar Patra is a Senior Data Architect with Amazon Web Services. He helps customers with their data strategy and migration journey, leveraging his years of industry and cloud experience.

Read MoreAWS Database Blog

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Most Popular

Recent Comments