Introducing BigQuery cross-region replication: enhanced geo-redundancy for your data

By mullaned2002

November 15, 2023

97

Geographical redundancy is one of the keys to designing a resilient data lake architecture in the cloud. Some of the use cases for customers to replicate data geographically are to provide for low-latency reads (where data is closer to end users), comply with regulatory requirements, colocate data with other services, and maintain data redundancy for mission-critical apps.

BigQuery already stores copies of your data in two different Google Cloud zones within a dataset region. In all regions, replication between zones uses synchronous dual writes. This ensures in the event of either a soft (power failure, network partition) or hard (flood, earthquake, hurricane) zonal failure, no data loss is expected, and you will be back up and running almost immediately.

We are excited to take this a step further with the preview of cross-region dataset replication, which allows you to easily replicate any dataset, including ongoing changes, across cloud regions. In addition to ongoing replication use cases, you can use cross-region replication to migrate BigQuery datasets from one source region to another destination region.

How does it work?

BigQuery provides a primary and secondary configuration for replication across regions:

Primary region: When you create a dataset, BigQuery designates the selected region as the location of the primary replica.Secondary region: When you add a dataset replica in a selected region, BigQuery designates this as a secondary replica. The secondary region could be a region of your choice. You can have more than one secondary replica.

The primary replica is writeable, and the secondary replica is read-only. Writes to the primary replica are asynchronously replicated to the secondary replica. Within each region, the data is stored redundantly in two zones. Network traffic never leaves the Google Cloud network.

While replicas are in different regions, they do not have different names. This means that your queries do not need to change when referencing a replica in a different region.

The following diagram shows the replication that occurs when a dataset is replicated:

Replication in action

The following workflow shows how you can set up replication for your BigQuery datasets.

Create a replica for a given dataset

To replicate a dataset, use the ALTER SCHEMA ADD REPLICA DDL statement.

You can add a single replica to any dataset within each region or multi-region. After you add a replica, it takes time for the initial copy operation to complete. You can still run queries referencing the primary replica while the data is being replicated, with no reduction in query processing capacity.

code_block<ListValue: [StructValue([(‘code’, “– Create the primary replica in the primary region.rnCREATE SCHEMA my_dataset OPTIONS(location=’us-west1′);rnrn– Create a replica in the secondary region.rnALTER SCHEMA my_datasetrnADD REPLICA `us-east1`rnOPTIONS(location=’us-east1′);”), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3eeef613f490>)])]>

To confirm the status that the secondary replica has successfully been created, you can query the creation_complete column in the INFORMATION_SCHEMA.SCHEMATA_REPLICAS view.

code_block<ListValue: [StructValue([(‘code’, “– Check the status of the replica in the secondary region.rnSELECT creation_time, schema_name, replica_name, creation_completernFROM `region-us-west1`.INFORMATION_SCHEMA.SCHEMATA_REPLICASrnWHERE schema_name = ‘my_dataset’;”), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3eeef613f430>)])]>

Query the secondary replica

Once initial creation is complete, you can run read-only queries against a secondary replica. To do so, set the job location to the secondary region in query settings or the BigQuery API. If you do not specify a location, BigQuery automatically routes your queries to the location of the primary replica.

code_block<ListValue: [StructValue([(‘code’, ‘– Query the data in the secondary region..rnSELECT COUNT(*) rnFROM my_dataset.my_table;’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3eeef613f340>)])]>

If you are using BigQuery’s capacity reservations, you will need to have a reservation in the location of the secondary replica. Otherwise, your queries will use BigQuery’s on-demand processing model.

Promote the secondary replica as primary

To promote a replica to be the primary replica, use the ALTER SCHEMA SET OPTIONS DDL statement and set the primary_replica option. You must explicitly set the job location to the secondary region in query settings.

code_block<ListValue: [StructValue([(‘code’, “ALTER SCHEMA my_dataset SET OPTIONS(primary_replica = ‘us-east1’)”), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3eeef613f580>)])]>

After a few seconds, the secondary replica becomes primary, and you can run both read and write operations in the new location. Similarly, the primary replica becomes secondary and only supports read operations.

Remove a dataset replica

To remove a replica and stop replicating the dataset, use the ALTER SCHEMA DROP REPLICA DDL statement. If you are using replication for migration from one region to another region, delete the replica after promoting the secondary to primary. This step is not required, but is useful if you don’t need a dataset replica beyond your migration needs.

code_block<ListValue: [StructValue([(‘code’, ‘ALTER SCHEMA my_datasetrnDROP REPLICA IF EXISTS `us-west1`;’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3eeef613f5b0>)])]>

Getting started

We are super excited to make the preview for cross-region replication available for BigQuery, which will allow you to enhance your geo-redundancy and support region migration use cases. Looking ahead, we will include a console-based user interface for configuring and managing replicas. We will also offer a cross-region disaster recovery (DR) feature that extends cross-region replication to protect your workloads in the rare case of a total regional outage. You can also learn more about BigQuery and cross-region replication in the BigQuery cross-region dataset replication QuickStart.

Cloud BlogRead More

Previous articleAccelerating a carbon-free future with hourly energy tracking

Next articleCloud Deploy adds pipeline automation and Cloud Run Jobs support

Introducing BigQuery cross-region replication: enhanced geo-redundancy for your data

How does it work?

Replication in action

Create a replica for a given dataset

Query the secondary replica

Promote the secondary replica as primary

Remove a dataset replica

Getting started

Elevating Endpoint Security: Managed Detection and Response (MDR) Enhances Cybersecurity

How cloud cost visibility impacts business and employment

Transforming customer feedback: analyzing audio customer reviews with BigQuery ML’s speech-to-text

LEAVE A REPLY Cancel reply

Most Popular

Amazon Q Business and Amazon Q in QuickSight empowers employees to be more data-driven and make better, faster decisions using company knowledge

Accelerate software development and leverage your business data with generative AI assistance from Amazon Q

Elevating Endpoint Security: Managed Detection and Response (MDR) Enhances Cybersecurity

How cloud cost visibility impacts business and employment

Recent Comments

EDITOR PICKS

Exploring the Click Element Variable in Google Tag Manager

How to track events with Google Tag Manager and Google Analytics

Data Layer Variable in GTM: What, Why, and Where?

POPULAR POSTS

Increase your content reach with automated document-to-speech conversion using Amazon AI services

Track Youtube Videos with Google Tag Manager and Google Analytics 4

Get more insights with the new version of the Node.js library

POPULAR CATEGORY