Announcing StreamSets Transformer Engine 4.0.0

By mullaned2002

June 24, 2021

966

StreamSets is excited to announce the immediate availability of StreamSets Transformer Engine 4.0.0. It is a modern ETL engine that enables developers and data engineers to build data pipelines and transformations that execute on Apache Spark.

Highlights

This is our biggest release ever and there are some great new features and enhancements included in this release—below I’ve reviewed some of the highlights. For a detailed and complete list of enhancements, new features, bug fixes, and upgrade instructions, please refer to the Release Notes.

Let’s take a closer look at some of my favorite highlights.

StreamSets Summer ‘21

Data engineers can now deploy StreamSets Transformer Engine 4.0.0 in the newly released StreamSets Summer ‘21 beta that will enable them to access the power of the StreamSets DataOps Platform to handle the breadth of enterprise workloads, while being able to get up and running fast on the cloud. NOTE: As an existing customer with an enterprise license, you can download the latest version through our StreamSets Support portal.

Spark 3.0 and Scala 2.12

StreamSets Transformer Engine 4.0.0 supports using Spark 3.0 and Scala 2.12. For information about the clusters that support Spark 3.0, see Cluster Compatibility Matrix. For information about the features available in different versions of Spark, see Spark Versions and Available Features.

Amazon Redshift

The new Amazon Redshift origin will enable users to ingest data from Amazon Redshift tables without having to use generic JDBC origin.

Amazon EMR Cluster Enhancements

Now users can run data pipelines on EMR 6.1.x or later 6.x.x clusters. For all supported versions, see Cluster Compatibility Matrix.

Bootstrap Actions — Users can use this new property to bootstrap executable files located on Amazon S3 or to bootstrap scripts defined in the pipeline.

Databricks Cluster Enhancements

Users can now run pipelines on Databricks 7.x and 8.x clusters. For all supported versions, see Cluster Compatibility Matrix.
Init script — Now users can use Databricks cluster-scoped init scripts when provisioning a cluster on AWS by defining a DBFS script in the stage or specifying a location including one on Amazon S3.
Job failover — Jobs running on Databricks clusters can now be configured for failovers.

Connection Catalog

A connection defines the information required to connect to an external system. The benefits of using connections are increased security and reusability where you can create a connection once and then reuse that connection in multiple pipelines–this also reduces maintainability and possibility of errors.

With this new release of StreamSets Transformer Engine 4.0.0, the following origins and destinations now support using Control Hub connections.

Origins
MySQL JDBC Table
Oracle JDBC Table
PostgreSQL JDBC Table
SQL Server JDBC Table
Amazon Redshift

Destination
Amazon Redshift

For detailed, technical information about StreamSets Transformer Engine, visit our documentation.

If you would like to see live demos of recently released features and enhancements, subscribe to StreamSets Live: Demos with Dash!

The post Announcing StreamSets Transformer Engine 4.0.0 appeared first on StreamSets.

Announcing StreamSets Transformer Engine 4.0.0

Highlights

StreamSets Summer ‘21

Spark 3.0 and Scala 2.12

Amazon Redshift

Amazon EMR Cluster Enhancements

Databricks Cluster Enhancements

Connection Catalog

Unveiling the Evaluation Pipeline: GenAI App Builder’s New Tool for Ensuring Excellence in Generative AI Applications

Boost efficiency and accuracy of loading CSV data with Workato

Unlocking GenAI’s Potential: Overcoming Legacy Tech and Data Challenges According to IT Leaders

LEAVE A REPLY Cancel reply

Most Popular

Schneider Electric automates Salesforce account hierarchy management with generative artificial intelligence (AI) using Amazon Aurora and Amazon Bedrock

Leverage enterprise data with Denodo and Vertex AI for generative AI applications

TypeScript takes aim at truthy and nullish bugs

Make relevant movie recommendations using Amazon Neptune, Amazon Neptune Machine Learning, and Amazon OpenSearch Service

Recent Comments

EDITOR PICKS

Exploring the Click Element Variable in Google Tag Manager

How to track events with Google Tag Manager and Google Analytics

Data Layer Variable in GTM: What, Why, and Where?

POPULAR POSTS

Modernizing compliance: Introducing Risk and Compliance as Code

How can Digital Service Providers achieve a 2X order completion rate with predictive service delivery operations?

Ransomware Rebounds: Extortion Threat Surges in 2023, Attackers Rely on Publicly Available and Legitimate Tools

POPULAR CATEGORY