Thursday, March 28, 2024
No menu items!
HomeDatabase ManagementAutomate the migration of Microsoft SSIS packages to AWS Glue with AWS...

Automate the migration of Microsoft SSIS packages to AWS Glue with AWS SCT

When you migrate Microsoft SQL Server workloads to AWS, you might want to automate migration and minimize changes to existing applications, but still use a cost-effective option without commercial licenses and reduce operational overhead. For example, SQL Server workloads often use SQL Server Integration Services (SSIS) to extract, transform, and load (ETL) data. In this case, you can consider AWS Glue Studio as a cloud alternative for your data integration operations. AWS Glue is a serverless data integration service that makes it easy to discover, prepare, and combine data for analytics, machine learning, and application development. AWS Glue Studio is a graphical interface that makes it possible to create, run, and monitor ETL jobs in AWS Glue. To accelerate the migration from SSIS to AWS Glue, you can use the AWS Schema Conversion Tool (AWS SCT).

AWS offers different paths for your migration journey from Microsoft SQL Server to the AWS. You can migrate to Amazon Relational Database Service (Amazon RDS) and chose an open source engine or Amazon Aurora to avoid commercial licenses, backup administration, and data center maintenance. Alternatively, you can use Babelfish for Aurora PostgreSQL which understands the SQL Server wire protocol and continue using the existing queries and drivers for your applications. You can also use AWS Database Migration Service (AWS DMS), a managed migration and replication service that helps you move your databases and analytics workloads to AWS quickly and securely.

In this post, we demonstrate how the AWS SCT can automate conversion of your existing SSIS packages to AWS Glue. AWS SCT can also be used to convert SQL Server databases into, for example, Amazon RDS for PostgreSQL or in case of migrating to Babelfish (refer to the Migrate from SQL Server to Amazon Aurora using Babelfish for more detailed steps), you can use the exported SQL/DDL code exported from Microsoft SQL Server and Babelfish Compass tool to assess which features are not compatible with Babelfish for PostgreSQL. After you convert your database schemas and ETL scripts, you can use the AWS Glue jobs to perform the same data operations.

Solution overview

We focus on conversion of SSIS packages that you use to perform ETL operations with the source SQL Server database. The target database used for this post is Babelfish for Aurora PostgreSQL, which provides native, semantically correct execution over the TDS wire protocol, allowing the connection to the target database using the Microsoft SQL Server JDBC, which we will be using for illustration purposes as the connector for AWS Glue. Using the instructions provided you have already converted your SQL Server database schemas and applied the converted code to your target database. Having the target database configured will allow us to convert the SSIS jobs using SCT to AWS Glue studio so that your job can run against your Babelfish target database.

An SSIS package includes the necessary components, such as the connection manager, tasks, control flow, data flow, parameters, event handlers, and variables, to run a specific ETL tasks.

You don’t need an installed SSIS server, AWS SCT reads the project files (files with the extensions .dtsx, .conmgr, or .params) from the local folder and parses them. The conversion process goes through the local SSIS files.

At a high level, the steps are as follows:

Create an AWS Glue custom connector for SQL Server.
Create an AWS Secrets Manager secret.
Create a connection and test your connector.
Add your SSIS packages as the ETL source and AWS Glue as the ETL target to your AWS SCT project.
Set up a mapping rule and configure connections.
Convert your source SSIS packages.
Configure and deploy an AWS Glue Studio job in AWS SCT.
Run the AWS Glue Studio job.

Prerequisites

For this walkthrough, you should have the following prerequisites:

An AWS account with permission to use AWS Glue and configure IAM roles. To create a role, refer to Create an IAM role for AWS Glue.
AWS SCT installed in your machine.
An SSIS project with ETL packages: .dtsx, .conmgr, and .params files.

Create an AWS Glue custom connector for SQL Server

In AWS Glue, a connector is an optional code package that assists with accessing data stores in AWS Glue Studio. You can use a natively supported data store, a connector from AWS Marketplace, or your own custom connectors.

You can create custom connectors for Spark, Athena, and JDBC data stores. AWS SCT requires a custom connector to replace the existing SSIS connections, we use the Microsoft SQL Server JDBC connector for illustration purposes.

Complete the following steps:

Download the JAR file.
Create an Amazon Simple Storage Service (Amazon S3) bucket or use an existing bucket.
Upload the downloaded JAR file to your S3 bucket.

After you upload the JDBC connector to Amazon S3, you can set up a custom connector, which holds the required information to set up a connection like class name or JDBC URL format. After you set your custom connector, you can create connections that connect to your data source and data targets in the AWS Glue jobs.

On the AWS Glue console, choose Data connections in the navigation pane.
Choose Create custom connector.

Under Connector S3 URL, enter the S3 path of your JAR file.
Enter the name for your custom connector, in this example SQLServer
For Connector type, choose JDBC.
Enter com.microsoft.sqlserver.jdbc.SQLServerDriver for Class name.
Enter jdbc:sqlserver://${host}:${port};databaseName=${dbname};user=${username};password=${password} for JDBC URL base.

By adding a placeholder with {KEY} to the URL base, it allows it to be replaced when you create the connection.

Enter ; for URL parameter delimiter.
Choose Save changes.

Create a Secrets Manager secret

To safely store sensitive information of your connection string, you can use Secrets Manager, which you can reference when creating your connection in AWS Glue.

On the Secrets Manager console, choose Store a new secret.
For Secret type, choose Other type of secret.
Enter the key-value pairs for host, port, dbname, username, and password.
Choose Next.
Enter the name for your Secret name, in this example SQL_SERVER_2014
Choose Next.
On the Review page, review your secret details, then choose Store.

Create a connection for your custom connector

After you create your custom connector, you can use it to create connections to your specific data sources and targets. To connect to a Babelfish for Aurora PostgreSQL, you can use the port 1433 instead of the PostgreSQL port 5432 using the SQL Server custom connector.

On the AWS Glue Studio console, choose Connectors in the navigation pane.
Choose the connector you created (SQLServer).

Choose Create connection.
Enter then Name for your connection, in this case SQLServer2014-connection.
Enter the AWS Secret created earlier, in this example SQL_SERVER_2014.
Specify all necessary parameters in the Network options section, and make sure your network options are able to access your SQL Server database.
Choose Save changes.

You can test your connection by creating a new AWS Glue job and choosing your custom connector as the source.

Add your SSIS packages as the ETL source and AWS Glue as the ETL target to your AWS SCT project

After you have created and tested your custom connections, you can now focus on migrating your SSIS packages using AWS SCT.

First, you need to create a new AWS SCT project. Complete the following steps:

On the AWS SCT console, choose File, New project.
Provide a name and location to save the new project.

Now you can add your source ETL engine and target ETL engine.

Choose Add source.
Select SQL Server Integration Service from the category ETL.

Choose Next.
Choose Browse, select your local folder with SSIS packages (files with the extensions .dtsx, .conmgr, or .params) and choose Connect.

Choose Add target and select AWS Glue Studio from the category ETL.
Specify your AWS profile .To create a profile, refer to Using the AWS SCT user interface and store AWS Service Profile in SCT
Choose Connect.

Set up a mapping rule and configure connections

To start a conversion, you need to create a mapping rule. A mapping rule defines the target platform for conversion of your source database schema. To create a mapping rule for your project, specify SSIS as the source and AWS Glue Studio as target, then choose Create mapping.

After the mapping view is displayed, select Main View

If the source SSIS package has connections, use the wizard Configure connections to create a mapping between SSIS and AWS Glue Studio connections. To run the wizard, open the context menu (right-click) on the category Connection managers and choose Configure connections.

To configure the connection mapping in SSIS, connection managers need to specify the SSIS package with the connection, and specify the AWS Glue Studio connection for the corresponding SSIS connection.

Specify AWS Glue Studio connection for the corresponding SSIS connection, created in previous step Create a connection for your custom connector.

Convert your source SSIS packages

The AWS SCT converts your selected SSIS packages to JSON files. These JSON objects represent a node in a Directed Acyclic Graph (DAG). Find your converted files in the Package DAGs node in the right tree.

In the source tree view, find category Packages, open the context (right-click) menu and then choose Convert package.

The SSIS package is transformed into DAG json. You can save these files into Amazon S3 by choosing Package DAGs, opening the context menu (right-click), and choosing Save to Amazon S3.

Configure and deploy your AWS Glue Studio job in AWS SCT

To configure and deploy your AWS Glue Studio job, complete the following steps:

In the target tree view, find the category DAGs.
Open the context menu (right-click) for your converted DAG, then choose Configure AWS Glue Studio Job.

You can choose multiple converted DAGs.

Modify the list of DAGs that will be used to create AWS Glue Studio jobs.
Specify AWS Glue Studio job basic properties, then choose Next.
Name – Enter the name of your AWS Glue Studio job. The default is name of converted DAG.
Script filename – Enter the name of your converted script. The default is name of converted DAG.
Job parameters – Enter run-time values that are necessary to run a job.

Specify advanced job properties (all required properties have default values), then choose Next.
IAM Role – Choose the IAM role that is used for authorization to resources used to run the job and access data stores.
Script file S3 path – Enter the Amazon S3 path to your converted script. The default is Amazon S3 folder “Job” in Amazon S3 folder of connection to AWS Glue Studio.
Encrypt script using SSE-S3 – Choose this option to protect data using server-side encryption with Amazon S3-managed encryption keys (SSE-S3).
Temporary directory – Enter the Amazon S3 path to a temporary directory for intermediate results depending on your job.
Generated python library path – AWS SCT uses this path to automatically generate the path for Python libraries. You can’t edit this automatically generated path. To use additional Python libraries, enter the path in User python library path.
Worker type – Set the type of predefined worker that is used when a job runs. The default is G.1X.
Requested number of workers – The number of workers you want AWS Glue to allocate to this job. The default is 2.
Max concurrency – Enter the maximum number of concurrent runs that are allowed for this job. The default is 1. AWS Glue returns an error when this threshold is exceeded.
Job timeout (minutes) – Enter the timeout value on your ETL job as a safeguard against runaway jobs. The default is 2880 minutes (48 hours) for batch jobs. If the job exceeds this limit, the job run state changes to TIMEOUT.
Delay notification threshold (minutes) – Enter the threshold in minutes before AWS SCT sends a delay notification.
Number of retries – Enter the number of times, from 0 to 10, that AWS Glue should automatically restart the job if it fails. Jobs that reach the timeout limit are not restarted. The default is 0.

Choose Finish.
In the target tree view, find and expand ETL Jobs.
Open the context menu (right-click) for the ETL job that you configured and choose Create AWS Glue Studio Job.

The Create AWS Glue Studio Job option is available on the root category ETL Jobs as well as on the single object. You can create multiple AWS Glue Studio jobs at the same time.

Run the AWS Glue Studio job

After your job has been created, it’s listed in your jobs list on the AWS Glue Studio console.

From there you can run your converted AWS Glue job and apply any required changes, or, if you prefer, convert and use it as a Spark script.

On the AWS Glue console, choose ETL Jobs in the navigation pane
Chose the job that was converted

You can run it by choosing on Run

You can open/review the Runs tab and verify the status of the job and also access its logs by selecting the Id

Conclusion

In this post, we showed how to convert SSIS packages to AWS Glue Studio jobs using the AWS SCT. It can help you automate your ETL package conversion using the described solution. The described solution is usually part of a database migration from SQL Server to Amazon RDS.

For more information about the AWS SCT, refer to the AWS SCT User Guide.

About the Authors

Felipe Lopes Marins is a Partner Solutions Architect in the Public Sector team at AWS. He helps partners and customers to translate their business challenges into technical solutions using AWS. Data analytics and artificial intelligence are some of his new passions, after previously spending time working more in application development in different technologies such as containers, serverless, and DevOps.

Oleksandr Fedorenko is a Database Engineer in the AWS DMA team. He helps to design and implement successful database migration solutions for SCT project. He is currently in charge of ETL conversion directions. Oleksandr and his team delivered SSIS to Glue and Glue Studio conversions as well as the Informatica Redirect project.

Hugo Mineiro is a Senior Analytics Specialist Solutions Architect based in Geneva. He focuses on helping customers across various industries build scalable and high-performant analytics solutions. At AWS he focuses on strategic accounts in the DACH region and helps customers to accelerate their journey to the cloud and innovate using AWS analytics and machine learning services. He loves playing football, and spending time with friends.

Read MoreAWS Database Blog

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Most Popular

Recent Comments