Announcing new stream recovery capabilities for Datastream

By mullaned2002

July 1, 2024

24

In the complex and ever-changing world of data replication, replication pipelines are prone to failure. Identifying the cause and time of failure can be a difficult task, and you must perform a series of manual steps to reinitiate replication while minimizing data integrity issues.

The new stream recovery feature in Datastream enables you to quickly resume your data replication with minimal to no data loss in situations such as database failover or prolonged network outages.

Consider a financial institution that uses Datastream to replicate transaction data from their operational database to BigQuery for analytics. Due to a hardware failure, the primary database instance undergoes an unplanned failover to a replica. The replication pipeline in Datastream is broken because the original source is now unavailable. Stream recovery allows the replication to resume from the failover database instance, ensuring no transaction data is lost.

Or think of an online retailer that uses Datastream to replicate customer feedback to BigQuery for sentiment analysis using BigQuery ML. A prolonged network outage disrupts the connection to the source database. By the time network connectivity is restored, some of the changes are no longer available on the database server. In this case, stream recovery allows the user to quickly resume the replication from the first available log position. While some feedback may be lost, the retailer prioritizes having the most recent data for ongoing sentiment analysis and trend identification.

Benefits of stream recovery

Stream recovery offers a number of benefits, including:

Reduced data loss: Recover from data loss caused by database instance failovers, unintended log file deletion, and other incidents.

Reduced downtime: Minimize downtime by quickly recovering your stream and resuming ongoing CDC ingestion.

Simplified recovery: Easily recover your stream via a simple and intuitive interface.

How to use stream recovery

Stream recovery provides a few options for you to choose from, depending on the specific failure scenario and the availability of recent log files. For MySQL and Oracle, you can choose to retry from the current log position, skip the current position and stream from the next available position, or skip the current position and stream from the most recent position. You also have the option to provide a specific log position, e.g., the Log Sequence Number (LSN) or Change Sequence Number (CSN), for the stream to resume from, which gives you fine-grained control to ensure that no data is lost or duplicated in the destination.

For PostgreSQL sources, you can create a new replication slot in your PostgreSQL database and instruct Datastream to resume streaming from the new replication slot.

Starting a stream from a specified position

In addition to stream recovery, there are a number of scenarios where you may need to start or resume a stream from a specific log position. For example, if the source database is upgraded or migrated, or when historical data already exists in the destination and you’d like to combine CDC from a specific point in time (where the historical data ends). The stream recovery API can be used in these cases to specify a starting position before starting the stream.

Get started

Stream recovery is now generally available in the Google Cloud console and API for all available Datastream sources in all Google Cloud regions.

To learn more about stream recovery, please visit the Datastream documentation.

Cloud BlogRead More

Previous articleUse Amazon Aurora Global Database to build resilient multi-Region applications

Next articleLoyal Guru slashes query latency by 40-50% with AlloyDB for PostgreSQL

Announcing new stream recovery capabilities for Datastream

Benefits of stream recovery

How to use stream recovery

Starting a stream from a specified position

Get started

Prompting best practices for BigQuery data canvas

Common GKE networking problems, and how to troubleshoot them

Understanding BigQuery data canvas: how to easily transform data into insights with AI

LEAVE A REPLY Cancel reply

Most Popular

Prompting best practices for BigQuery data canvas

Common GKE networking problems, and how to troubleshoot them

Understanding BigQuery data canvas: how to easily transform data into insights with AI

AWS’ new approach to RAG evaluation could help enterprises reduce AI spending

Recent Comments

EDITOR PICKS

Exploring the Click Element Variable in Google Tag Manager

How to track events with Google Tag Manager and Google Analytics

Data Layer Variable in GTM: What, Why, and Where?

POPULAR POSTS

Perform a side-by-side upgrade in AWS DMS by moving tasks to minimize business impact

Technology Innovation Institute trains the state-of-the-art Falcon LLM 40B foundation model on Amazon SageMaker

Troubleshooting Cloud Functions connection issues to Cloud SQL private IPs

POPULAR CATEGORY