When working with an Oracle Multitenant database, you may be looking for ways to replicate ongoing changes to Amazon Simple Storage Service (Amazon S3) while filtering or identifying transactions from specific users.
In this post, we explore how to use AWS DMS to create a replication task with transformation rules to replicate ongoing changes from an Oracle Multitenant database to Amazon S3, while identifying transactions from specific users, which later can be processed or filtered by ongoing data pipelines.
For this post, we examine a use case for a data lake with an Oracle database with a multitenant architecture as our source. We want to replicate ongoing changes on the source database to Amazon S3, with the capability to identify transactions from specific users, which later can be filtered or processed through data pipelines. For example, we might have a requirement to ignore all transactions coming from a maintenance job run by a specific database user.
We use AWS DMS to replicate data from our Oracle database to Amazon S3. AWS DMS has native support to filter transactions from a user on Oracle Database by adding the extra connection attribute filterTransactionsOfUser when configuring an endpoint.
In AWS DMS, there are two methods for reading the redo logs when doing change data capture (CDC) for Oracle as a source: Oracle LogMiner and AWS DMS Binary Reader. LogMiner is an Oracle API to read the online redo logs and archived redo log files. Binary Reader is an AWS DMS method that reads and parses the raw redo log files directly. To support CDC for an Oracle Multitenant database as source, AWS DMS requires using Binary Reader when configuring the source endpoint for the Oracle database. The extra connection attribute we mentioned earlier (filterTransactionsOfUser) is only compatible when replicating data from Oracle when using LogMiner.
In this post, we create an AWS DMS CDC task with an Oracle Multitenant database as source and Amazon S3 as target. We provide a solution using AWS DMS transformation rules with an additional column as a flag for each row, which is part of the transaction, to indicate whether the row is to be filtered or considered by the ongoing data pipelines.
The high-level steps to implement this solution are as follows:
Create a source endpoint for the Oracle Multitenant database with Binary Reader.
Create a target endpoint for Amazon S3.
Create an AWS replication task with the necessary transformation rules.
Validate the provisioned resources and data replication.
The following diagram outlines the proposed solution architecture.
Complete the following prerequisite steps:
Provision your baseline networking resources in AWS for your AWS DMS replication instance.
Create an AWS DMS replication instance.
Make sure you have an existing Oracle database with a multitenant architecture and the necessary prerequisites to support CDC.
Create an S3 bucket, with no public access, in the same Region as your AWS DMS replication instance.
Create an AWS DMS endpoint for an Oracle database as source
Create an AWS DMS endpoint with the endpoint type as source for your Oracle Multitenant database. To support CDC for an Oracle Multitenant database as source, AWS DMS requires you to use Binary Reader when configuring the source endpoint with the following connection attributes:
After you create the endpoint, test the connection by selecting the endpoint and the AWS DMS replication instance. The status of the test should show as Successful before you can move forward.
Create an AWS DMS endpoint for Amazon S3 as target
After the endpoint is created, test the connection by selecting the endpoint and the AWS DMS replication instance. The status of the test must show as Successful before you can move forward.
For this post, we use a CSV file format with no compression.
Create an AWS DMS replication task from Oracle to Amazon S3
Because there is no direct connection attribute that AWS DMS supports to identify or filter transactions from a specific user for an Oracle Multitenant database, we use AWS DMS transformation rules to add a flag (column) to each row that is part of the transaction from a specific user. We can use this flag in downstream data pipelines to either filter these transactions or do further processing.
For this post, we use the following AWS DMS headers, which capture source transaction information:
AR_H_USER – This is the user on the source database that made the change
AR_H_OPERATION – This indicates whether the operation on the source database is INSERT, UPDATE, or DELETE
The following code contains the AWS DMS CDC replication task mapping rules. We add a new column with the AWS DMS expression AR_H_USER, which captures the source USERNAME that made the transaction, the expression AR_H_OPERATION, indicating the type of operation, and the column del_flag to indicate whether the transaction is from a specific user and of a specific type. In the following mapping example, any transaction from the ADMIN user and of type DELETE will set the column del_flag to 1, otherwise it will be 0 for all other transactions from any user.
Start the AWS DMS task and validate the CDC data is replicated from the source Oracle database to Amazon S3. You will notice three additional columns for every row being replicated from source to target, indicating the user who did the transaction, the type of operation, and a flag indicating whether the transaction is coming from a specific user and is of a specific type.
Validate the data
In this post, we want to identify transactions from the ADMIN user that are of DELETE type, so they can be filtered or further processed in downstream data pipelines.
To validate if the AWS DMS CDC replication task is replicating transactions from Oracle to Amazon S3, start by downloading the data files from Amazon S3, which has following folder structure:
Amazon S3 bucket name > Folder > Schema name > Table name
The following examples show the data within the files, with transactions from two different users, ADMIN and NEWADMIN. Notice the column del_flag when the transaction type is DELETE for both the users.
To avoid incurring ongoing charges, clean up the resources you created as part of this post.
The post showed how to use AWS DMS transformation rules with a CDC replication task to replicate changes from an Oracle Multitenant database to Amazon S3 and identify transactions from specific users, which can be transformed or processed by downstream data pipelines.
We invite you to leave your feedback in the comments section.
About the authors
Sharath Chandra Kampili is a Database Specialist Solutions Architect with Amazon Web Services. He works with AWS RDS team, focusing on commercial database engines like Oracle. Sharath works directly with AWS customers to provide guidance and technical assistance on the database projects, helping them improve the value of their solutions when using AWS.
Venu Koneru is a Database Specialist Solutions Architect at AWS with over 15 years of diverse experience in the IT Industry. Throughout his career, he has worn various hats, serving as Web Developer, Data Modeler, Database Administrator and Database Architect across Education, Travel and Finance Industries. His current role at AWS involves collaborating closely with customers to drive modernization of their databases in the cloud.
Read MoreAWS Database Blog