It’s surprising just how many business problems boil down to how quickly you can fill a bucket— particularly when that bucket is in Cloud Storage and you’re trying to fill it with data from a Compute Engine instance. While there are products like Storage Transfer Service that help move large amounts of data quickly, sometimes you need a more tactical solution for smaller migrations—which is why gsutil cp is so popular. By breaking up transfers and executing them in parallel, gsutil cp can be pretty fast.
That said, there are situations where gsutil will require tweaking to speed up transfers. If you are using gsutil for a large, single-file transfer, for example, you will need to change the default settings to get the best performance.
That’s why we’re excited to announce gcloud storage, a new set of Cloud Storage commands in Cloud SDK that has been engineered to be fast by default.
How gcloud storage works
Like gsutil before it, gcloud storage takes large files and breaks them down into pieces, so that transfers can best take advantage of the available bandwidth. What’s new in gcloud storage is its parallelization strategy, which treats task management as a graph problem, where each unit of work is treated as a node, and each dependency as an edge. This strategy allows more work to be done in parallel with far less overhead.
Under the hood, gcloud storage also benefits from a new hashing library that enables faster integrity checking. It can also adjust its own settings based on the workload and local machine size to optimize for performance.
Just how fast is it?
We measured the performance of gcloud storage on the following environment:
us-east4 for both VM and bucket
n2d-standard-16 (8 vCPUs, 32 GB memory)
1x375GB NVME in RAID0
When transferring 100 files that were 100MB in size, gcloud storage was 79% faster than gsutil on download and 33% faster on upload, with both using a composite upload strategy, where a file was broken up and uploaded as separate files, and then recombined into a single file.
When transferring a single large file, the difference is even more pronounced. With a 10GB file, gcloud storage was 94% faster than gsutil on download and 57% faster on upload. This performance improvement comes without the need for extensive testing and tweaking, making it easy to see much faster transfer times.
Try it for yourself
Once you update to the latest version of Cloud SDK, you can try out gcloud storage by running the following commands:
gcloud alpha storage ls gs://pub
gcloud alpha storage cp -r gs://pub/shakespeare/ .
Currently gcloud storage supports the following features that you use today in gsutil:
It’s worth noting that gcloud storage is in Preview. We will be adding more features over time to best reflect the capabilities of Cloud Storage. In the meantime we hope you’ll enjoy a faster way to fill your buckets.
Acknowledgements: Special thanks to the Google Cloud engineering team that made all this possible: Nick Hartunian, Tech Lead Dilip Pednekar, and Ross Rauber.
Cloud BlogRead More