Data in Cloud Storage is not tied to any one disk, machine or data warehouse. This makes the data more accessible to its intended users, who may be scattered across the globe. As a Cloud Storage administrator, you have the ability to choose whether your data will reside in a single region or a group of regions, in line with where your users and compute are located and how they are using the data. Different applications will have different requirements in terms of performance, location and redundancy, and Cloud Storage can accommodate almost any of those requirements. However, circumstances change: budgets may be cut, workloads shift, and new business requirements come to light. The original multi-region bucket may not be the best location type — perhaps regional storage is better suited to your current workloads.
In this blog, we will outline the factors that should be analyzed when choosing between regional, dual-region, or multi-region Cloud Storage. Then, in our next blog, we’ll describe how to migrate your data from one location type to another through the example of a multi-region to regional migration.
Choosing location type
To store an object in Cloud Storage, you must first create a bucket to put it in. Upon creation, you must classify each bucket as residing in one of three location types:
Region : such as São Paulo
Dual-region : such as Tokyo and Osaka
Multi-region : such as the US
This classification is permanent for the bucket, however, as you will see in the next section, the data inside it can always be moved to a new bucket with a different location type.
Each location type comes with differences in price, availability and performance. The details can be referenced in the Cloud Storage documentation. For all location types, data can be accessed from any region at any time, but the cost and performance when doing so may differ.
Architecturally, multi-region storage differs from regional or dual-region in that Google chooses where to store your data across many different regions (within a continent), which are subject to change over time. Where the data is stored and where incoming traffic is directed is determined not only by geographical proximity, but also storage capacity, network load, and a variety of other factors. This dynamic architecture allows Google to offer multi-region storage with superior availability and redundancy to regional storage without incurring the higher price of dual-region storage. However, it can also introduce unpredictable latency into the response time and higher network egress charges for cloud workloads when multi-region data is read from remote regions.
Regional storage will give you the best price for performance possible, as long as you’re processing and serving your data in the same region. However, there are tradeoffs to regional storage in the areas of availability and resiliency. Your data is only in a single region, therefore if that region were to experience an outage, your pipeline or application could be severely impacted. In addition, reads and writes originating in other regions will also have to travel to the storage region, which could add noticeable latency to your workloads.
Dual-region storage is a ‘best of both worlds’ solution, as it provides the ability to scale to TB per second (like regional), but also provides a second copy of data in a second region, protecting against regional outages. Similar to regional storage, dual-region storage provides customers with an environment to drive high-throughput analytical workloads by co-locating compute and storage in two regions of their choice.
The final thing to take into consideration is cost. Dual-region storage has a higher base storage cost than the other two location types, and, like multi-region storage, charges for object replication. However, depending on the amount of egress charges your workload will incur, dual-region may not always be the most expensive option overall. Use the Google Cloud pricing calculator to estimate your costs.
With all this in mind, you must consider what kind of workloads your data will be serving. Analytical workloads, where compute should usually be co-located with storage, are particularly well suited to regional or dual-region storage. In contrast, any workload that serves content to a wide geographical area (e.g., e-commerce or website hosting) will likely find multi-region storage to be the most attractive option.
Next: Migrating from one location type to another
Whichever storage option you choose, you’ll need to migrate your data from one storage type to another. You can read about how to do that in the next blog, Multi-region to Regional: A Cloud Storage Migration.
Cloud BlogRead More