Getting to the cloud: Best practices for migrating from On-prem to Google Cloud using Storage Transfer Service

By mullaned2002

August 19, 2021

1559

Organizations have been moving their on-premises data and applications to the cloud for the past several years, driven by reasons as varied as application modernization and content delivery to archival. In particular, we have seen migration momentum pick up in sectors like media and entertainment, where customers are rethinking how they monetize and store their valuable historical content, often while exploring Google Cloud’s many analytical and AI solutions.

Many of these customers are interested in moving their unstructured data from on-premises appliances to Google’s Cloud Storage. Over the past year, we’ve noticed an uptick in larger migrations, where customers move tens of petabytes or more of on-premises file data to Google’s flexible, secure object storage.

For customers like Telecom Italia/TIM Brasil, Google’s fully managed Storage Transfer Service played a key role in making this transformation possible by moving data from on-premises filesystems to extensible, low-cost Cloud Storage over the network.

“Storage Transfer Service helped us move petabyte-scale data from on-premises filesystem to Google Cloud in a highly performant and fully-managed way,” said Auana Mattar, CIO at Telecom Italia/TIM Brasil. “Setting up the transfer pipeline required performing some tests to figure out the ideal number of agents and networking settings in our on-prem environment. Once the initial setup was done, transferring data was seamless, and the service was able to saturate a 20 Gbps Partner Interconnect link.”

While large-scale cloud migrations can be intimidating, there are a number of actions that customers can take to ensure that a multi-petabyte data transfer goes as smoothly as possible. In the past, we’ve shared general architectural guidance for customers new to their cloud journey. And for customers looking for options, Google has multiple paths to move on-premises file data to the cloud, including our fully offline Transfer Appliance.

In this blog post, we’ll provide an updated perspective focused on how to use Storage Transfer Service to move data from on-prem to the cloud. Specifically, we’ll look at three different factors to consider prior to moving large amounts of data from your on-premises filesystem to Cloud Storage with our Storage Transfer Service.

Understanding the source files and filesystem

If you are moving data from an on-premises filesystem, you should be aware of how your source files, and your source filesystem, can impact transfer performance.

Each copy you make to Cloud Storage incurs some overhead from associated operations like metadata transfer, checksumming, and encryption. This means that, for a given amount of storage, transferring large numbers of very small files will take longer. As a rule of thumb, Storage Transfer Service will be most performant when moving files that are 16 MB or larger.

If you have a large number of smaller files, you may choose to batch them using tools like tar and upload as a single object. This will improve transfer performance but will limit how you can use those files in Cloud Storage. It’s an option best considered for use cases like archival storage, where the data transferred to Google Cloud may not be managed or accessed regularly. Our Nearline, Coldline, and Archive archival tiers offer excellent performance should you ever need to retrieve the archived data.

The source filesystem may also slow down transfer performance, particularly if the filesystem has limited read throughput. Tools like Fio can be used to test read throughput. We’ve included a command below to run a series of 1MB sequential read operations in Fio and to generate a report:

Fio will then generate a report. The final line labeled ‘bw’ represents the total aggregate bandwidth of all threads, and it can be used as a proxy for read throughput. In general, you should strive for read throughput (‘bw’) that is 1.5x of your desired upload throughput or speed. (And one easy way to increase read throughput is to ensure that the filesystem itself is not imposing any limits on maximum throughput.)

Optimizing Storage Transfer Service resources

Within the Storage Transfer Service, there are a few settings that we can adjust ahead of time to ensure optimal performance for a larger workload.

First, we should ensure that we have the right number of transfer agents for our source data. We would advise that, for any transfer job larger than 1 GB, you start with at least three agents in separate VMs, with each agent assigned at least 4 vCPU and 8 GB of RAM. In addition to providing a foundation for performant data transfer, this architecture also ensures that the transfer is fault tolerant should one agent machine become unavailable.

Google Cloud supports up to 100 concurrent agents for a given Google Cloud project. To help you identify the right number of agents to support your workload, you should start your larger transfer first, then wait three minutes after adding each agent to ensure that throughput has stabilized.

In general, each agent can facilitate roughly 1 Gbps of throughput for up to 10 agents, at which point it may be necessary to add more agents for a very large amount of network bandwidth. For example, in one larger migration, a customer with 20 Gbps of dedicated network capacity ran ~30 agents at once. These numbers illustrate what was required at one enterprise data center. Across all of your environments, it is important to test and monitor your throughput via Cloud Monitoring to ensure you have the right configuration for your transfer goals.

Another area to optimize is where and how you install your agents. As we mentioned earlier, agents should be installed in separate VMs, and each host machine should dedicate at least 4 vCPUs and 8 GB of memory per agent. This is a starting off point, and larger, long-running transfers may require additional CPU or memory. For those longer jobs, we advise that you monitor CPU utilization and unused memory closely to ensure optimal performance. You should provision more CPUs when utilization exceeds 70%. Similarly, you should be ready to provision additional memory when the agent has less than 1GB of unused memory.

Preparing your network for large-scale data transfer

The third and final area to consider is your network connectivity. While it can be easy to reduce this to the bandwidth between the source filesystem and the Google Cloud bucket, the network includes two other components that can be easier to configure: first, the network interface from the on-premises agents to the WAN; and second, the agents’ connection to the on-premises filesystem.

For the first component, the network interface from the on-premises agents to the WAN, the general guidance is to not let this become a bottleneck. Specifically, you should ensure that this interface is greater than or equal to the bandwidth you require to read from the filesystem, plus the upload bandwidth to write to Google Cloud. In other words, if you plan on moving 10 Gbps of data from on-premises to Google Cloud, you will need 20 Gbps of bandwidth between the on-premises agents to the WAN: 10 Gbps to read from the networked filesystem, and 10 Gbps to transfer and write to Google Cloud.

On-premises filesystems, and on-premises networks, come in many flavors. For the second component, how the agents connect to an on-premises filesystem, our general rule is to be mindful of latency and to test regularly. It is essential to ensure that agents run on machines that can access a networked filesystem with very low latency.

Finally, if you are trying to maximize transfer performance, make sure you’ve configured your network to avoid bandwidth restrictions between on-premises filesystem and Google Cloud that might impact transfer speed. Storage Transfer Service will allow you to cap the bandwidth used by transfer, making it easy to minimize any impact on other production applications. Consider using tools like lperf3, tcpdump, and gsutil to measure the network bandwidth available to upload to Cloud Storage.

In particular, gsutil is worth some additional detail. Gsutil is a Python tool that can help you perform a number of object storage management tasks in Google Cloud, including checking your agent’s connection to the Cloud Storage APIs. It can be installed via the Google Cloud SDK, or separately. In this case, you should also ensure that gsutil is available in the same on-premises VM as the Storage Transfer Service agent.

If you’d like to use gsutil to test connectivity to Google Cloud, here’s the command:

CP is a copy command that lets you copy data from on-premises to the cloud. In this case, gsutil is copying a test document, test.txt, to ensure that you have a connection with the Google Cloud APIs. You will need to create a test document before running this command.

Test twice, transfer once

One overarching theme across all factors is that testing can help you understand performance. For many customers, a large-scale data transfer from an on-premises filesystem to Google Cloud is an unusual event. And as with any unusual event in enterprise IT, it is a great idea to make sure that each party – from network administrators to filesystem and storage experts to cloud architects – is able to test their domain multiple times, to ensure the event will proceed seamlessly.

As you fine tune your testing in advance of a data transfer, you may want to learn more about your options for obtaining more network bandwidth, orchestrating transfer from SMB filesystems, or even how to make the right choices to save money on object storage. And we plan to share more in our blog about how customers have used our transfer offerings in the months ahead. Advanced agent setup | Cloud Storage Transfer Service Documentation

For more information about Storage Transfer Service and how to get started, please take a look at our documentation or get started via the Google Cloud console.

Cloud BlogRead More

Previous articleRossi Residencial migrates SAP to the cloud with zero impact on operations

Next articleScaling data access to 10Tbps (yes, terabits) with Lustre

Getting to the cloud: Best practices for migrating from On-prem to Google Cloud using Storage Transfer Service

Understanding the source files and filesystem

Optimizing Storage Transfer Service resources

Preparing your network for large-scale data transfer

Test twice, transfer once

Leverage enterprise data with Denodo and Vertex AI for generative AI applications

TypeScript takes aim at truthy and nullish bugs

Hex-LLM: High-efficiency large language model serving on TPUs in Vertex AI Model Garden

LEAVE A REPLY Cancel reply

Most Popular

Schneider Electric automates Salesforce account hierarchy management with generative artificial intelligence (AI) using Amazon Aurora and Amazon Bedrock

Leverage enterprise data with Denodo and Vertex AI for generative AI applications

TypeScript takes aim at truthy and nullish bugs

Make relevant movie recommendations using Amazon Neptune, Amazon Neptune Machine Learning, and Amazon OpenSearch Service

Recent Comments

EDITOR PICKS

Exploring the Click Element Variable in Google Tag Manager

How to track events with Google Tag Manager and Google Analytics

Data Layer Variable in GTM: What, Why, and Where?

POPULAR POSTS

May 2021 – Product Newsletter

Accelerating Ulta Beauty’s modernization with managed containerized microservices

Poor cloud architecture and operations are killing cloud ROI

POPULAR CATEGORY

Getting to the cloud: Best practices for migrating from On-prem to Google Cloud using Storage Transfer Service

Understanding the source files and filesystem

Optimizing Storage Transfer Service resources

Preparing your network for large-scale data transfer

Test twice, transfer once

Registration is open for Google Cloud Next: October 12–14

LEAVE A REPLY Cancel reply

Most Popular

Recent Comments

EDITOR PICKS

POPULAR POSTS

POPULAR CATEGORY