Thursday, April 25, 2024
No menu items!
HomeCloud ComputingAutomating log uploads with gcloud transfer

Automating log uploads with gcloud transfer

Hi, my team just released the gcloud transfer command-line tooI, and this tutorial will show you how to use it for a common task: uploading logs to the cloud.

Setup

You’ll need a device running a Linux operating system with at least 8 GB of RAM to continue. If you don’t have one lying around, it’s easy to spin up a Compute Engine virtual machine.

Let’s create some logs to upload. In the real world, Google Cloud’s transfer service is a great tool if you have large amounts of data (terabytes+). But tutorials don’t usually ask people to create multiple harddrive’s worth of fake data. So let’s do this:

code_block[StructValue([(u’code’, u’$ mkdir my-logsrn$ cd my-logsrn$ echo “i am a petabyte” > logs.txt’), (u’language’, u”)])]

Perfect. That will fool them.

On to the gcloud CLI. If you haven’t already, install the gcloud CLI. You should be prompted to log into Google during the installation process.

You’re probably wondering how much this tutorial will cost to complete in your Google Cloud project. At the time of writing, transfer jobs cost “$0.0125 per GB transferred to the destination successfully.” Here’s the current price table.

Next, you’ll need a Google Cloud Storage bucket to upload to. Object storage also shouldn’t be very expensive, but please save resource names for cleanup at the end of the tutorial. Here’s the price table. You can create a bucket by running:

code_block[StructValue([(u’code’, u’$ gsutil mb [globally unique bucket ID]’), (u’language’, u”)])]

Using gcloud transfer

To begin, let’s grant ourselves the permissions necessary to use all gcloud transfer features:

code_block[StructValue([(u’code’, u’$ gcloud transfer authorize’), (u’language’, u”)])]

Creating transfers from one cloud bucket to another is straightforward with gcloud transfer. Setting up your local file system to handle transfer jobs requires a little more work. Specifically, you need to install an “agent.” An agent is basically a docker container that runs a program dedicated to copying files.

Before installing any agents, you need an agent pool. When a transfer job assigns work to an agent pool, any agent in that pool might end up copying files. Use agent pools to make sure only agents with access to the files you want execute a transfer job.

code_block[StructValue([(u’code’, u’$ gcloud transfer agent-pools create [pool ID]’), (u’language’, u”)])]

Now, to install an agent on your system, run:

code_block[StructValue([(u’code’, u’$ gcloud transfer agents install –pool=[pool ID]’), (u’language’, u”)])]

All right, now we can upload our fake logs! Storage Transfer Service works best with absolute paths, so use the “pwd” command to get the path to your current folder—you should be inside the “my-logs” folder from earlier.

We require a “posix://” scheme for uploading from a POSIX file system (Linux & Mac). I know it’s a bit odd, but it’s to leave space open if we support transfer jobs dedicated to other file system types in the future (e.g. “ntfs://”).

code_block[StructValue([(u’code’, u’$ gcloud transfer jobs create posix://$(pwd) gs://[bucket ID] –source-agent-pool=[pool ID]’), (u’language’, u”)])]

Great, the above should return your new transfer job’s metadata. To monitor the transfer, run the below with the value for the “name” key returned above:

code_block[StructValue([(u’code’, u’$ gcloud transfer jobs monitor [transfer job ID]rn$ gsutil ls gs://[bucket ID]’), (u’language’, u”)])]

Automation

Say we wanted to upload logs every midnight from 2022 to 2023. The ability to schedule regular transfers for large amounts of data differentiate gcloud transfer from tools like gcloud storage or gsutil. To do this, we just need to update the schedule properties of our job:

code_block[StructValue([(u’code’, u’$ gcloud transfer jobs update [transfer job ID] –schedule-repeats-every=24h \rn –schedule-starts=2022-01-01 \rn –schedule-repeats-until=2023-01-01′), (u’language’, u”)])]

If you have another machine, and you do not care which one uploads logs, you could install an agent on that machine in the same pool as before.

More realistically, if you want each machine in your fleet to upload logs to a different cloud destination, we can write a script to run once on each device. Just make sure the agent pool and destination argument are different for each device, or more than one machine may upload to the same location.

You don’t have to go around running this script on multiple computers to complete the tutorial but for demonstrative purposes:

code_block[StructValue([(u’code’, u’# !/bin/bashrn# First argument $1 is agent pool ID. Ex: u201cpool1u201d.rn# Second argument $2 is the source path. Ex: u201cposix:///tmp/logsu201drn# Third argument $3 is the destination path. Ex: u201cgs://my-bucket/log-dir1u201drnrngcloud transfer agent-pools create $1rngcloud transfer agents install –pool=$1rngcloud transfer jobs create $2 $3rn –schedule-repeats-every=24h \rn –schedule-starts=2022-01-01 \rn –schedule-repeats-until=2023-01-01′), (u’language’, u”)])]

If you’re interested in more complex scripting, the “jobs create” and “jobs run” commands have a “–no-async” flag you can use to delay until a transfer completes.

Teardown

This is the part where we delete everything to save you monthly costs.

First, let’s delete the transfer job:

code_block[StructValue([(u’code’, u’$ gcloud transfer jobs delete [transfer job ID]rn# If you lost your transfer job ID, you can try to find it by running the below command.rn$ gcloud transfer jobs list –expand-table’), (u’language’, u”)])]

Next, follow the instructions provided by this command to delete any agents you installed:

code_block[StructValue([(u’code’, u’$ gcloud transfer agents delete –all’), (u’language’, u”)])]

Now, let’s delete the empty agent pool:

code_block[StructValue([(u’code’, u’$ gcloud transfer agents-pools delete [agent pool ID] rn# If you lost your agent pool ID, you can try to find it by running the below command.rn$ gcloud transfer agent-pools list’), (u’language’, u”)])]

Lastly, let’s delete the Google Cloud Storage bucket and the fake logs on your device:

code_block[StructValue([(u’code’, u’$ gsutil rm -r [bucket ID]rn# If you lost your bucket ID, you can try to find it by running the below command.rn$ gsutil ls -brn$ rm logs.txtrn$ cd ..rn$ rmdir my-logs’), (u’language’, u”)])]

Conclusion

Superb—you learned how to build an automated log uploader!

If you’re comparing gcloud transfer to other tools like gsutil, I linked some helpful articles in the “Related” section. TLDR: gcloud transfer is for copying huge amounts of data (even petabytes!) and automating recurring copies. gsutil is better for less than a terabyte of data, and recurring copies have to be manually scripted (e.g. cron job calls gsutil).

If you’re copying files between clouds, we also support Amazon S3 and Azure Storage sources.

Congratulations on adding another tool to your Google toolkit!

Related Article

Faster Cloud Storage transfers using the gcloud command-line

The new gcloud storage enables super-fast data transfers using a new parallelization strategy and hashing library.

Read Article

Cloud BlogRead More

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Most Popular

Recent Comments