All you need to know about Datastream

By mullaned2002

July 30, 2021

1091

With data volumes constantly growing, many companies find it difficult to use data effectively and gain insights from it. Often these organizations are burdened with cumbersome and difficult-to-maintain data architectures.

One way that companies are addressing this challenge is with change streaming: the movement of data changes as they happen from a source (typically a database) to a destination. Powered by change data capture (CDC), change streaming has become a critical data architecture building block. We recently announced Datastream, a serverless change data capture and replication service. Datastream’s key capabilities include:

Replicate and synchronize data across your organization with minimal latency. You can synchronize data across heterogeneous databases and applications reliably, with low latency, and with minimal impact to the performance of your source. Unlock the power of data streams for analytics, database replication, cloud migration, and event-driven architectures across hybrid environments.Scale up or down with a serverless architecture seamlessly. Get up and running fast with a serverless and easy-to-use service that scales seamlessly as your data volumes shift. Focus on deriving up-to-date insights from your data and responding to high-priority issues, instead of managing infrastructure, performance tuning, or resource provisioning.Integrate with the Google Cloud data integration suite. Connect data across your organization with Google Cloud data integration products. Datastream leverages Dataflow templates to load data into BigQuery, Cloud Spanner, and Cloud SQL; it also powers Cloud Data Fusion’s CDC Replicator connectors for easier-than-ever data pipelining.

Click to enlarge

Datasteam use cases

Datastream captures change streams from Oracle, MySQL, and other sources for destinations such as Cloud Storage, Pub/Sub, BigQuery, Spanner and more. Some use cases of Datastream:

For analytics use Datastream with a pre-built Dataflow template to create up-to-date replicated tables in BigQuery in a fully-managed way.For database replication use Datastream with pre-built Dataflow templates to continuously replicate and synchronize database data into Cloud SQL for PostgreSQL or Spanner to power low-downtime database migration or hybrid-cloud configuration.For building event-driven architectures use Datastream to ingest changes from multiple sources into object stores like Google Cloud Storage or, in the future, messaging services such as Pub/Sub or Kafka Streamline real-time data pipeline that continually streams data from legacy relational data stores (like Oracle and MySQL) using Datastream into MongoDB.

How do you set up Datasteam?

Create a source connection profile.Create a destination connection profile.Create a stream using the source and destination connection profiles, and define the objects to pull from the source.Validate and start the stream.

Once started, a stream continuously streams data from the source to the destination. You can pause and then resume the stream.

Connectivity options

To use Datastream to create a stream from the source database to the destination, you must establish connectivity to the source database. Datastream supports the IP allowlist, forward SSH tunnel, and VPC peering network connectivity methods.

Private connectivity configurations enable Datastream to communicate with a data source over a private network (internally within Google Cloud, or with external sources connected over VPN or Interconnect). This communication happens through a Virtual Private Cloud (VPC) peering connection.

For a more in-depth look into Datastream check out the documentation.

For more #GCPSketchnote, follow the GitHub repo. For similar cloud content follow me on Twitter @pvergadia and keep an eye out on thecloudgirl.dev.

Cloud BlogRead More

Previous articleShip automations as product with Projects

Next article27 top SaaS companies for business

All you need to know about Datastream

Datasteam use cases

How do you set up Datasteam?

Connectivity options

Leverage enterprise data with Denodo and Vertex AI for generative AI applications

TypeScript takes aim at truthy and nullish bugs

Hex-LLM: High-efficiency large language model serving on TPUs in Vertex AI Model Garden

LEAVE A REPLY Cancel reply

Most Popular

Schneider Electric automates Salesforce account hierarchy management with generative artificial intelligence (AI) using Amazon Aurora and Amazon Bedrock

Leverage enterprise data with Denodo and Vertex AI for generative AI applications

TypeScript takes aim at truthy and nullish bugs

Make relevant movie recommendations using Amazon Neptune, Amazon Neptune Machine Learning, and Amazon OpenSearch Service

Recent Comments

EDITOR PICKS

Exploring the Click Element Variable in Google Tag Manager

How to track events with Google Tag Manager and Google Analytics

Data Layer Variable in GTM: What, Why, and Where?

POPULAR POSTS

Automate and implement version control for Amazon Kendra FAQs

AWS updates Amazon Bedrock service with new large language models

Google Analytics 4 Comparisons – How to use them?

POPULAR CATEGORY