Announcing Spot Pods for GKE Autopilot—save on fault tolerant workloads

By mullaned2002

November 9, 2021

1422

We launched GKE Autopilot back in February and since then, we’ve been hard at work adding functionality to deliver a fully featured, fully managed Kubernetes platform. Today, we’re excited to introduce Spot Pods.

(Not familiar with GKE Autopilot yet? Check out the Autopilot breakout session at Google Cloud Next ‘21, which gives a rundown of everything this new Kubernetes platform can do. Customers like the Japanese healthcare startup Ubie are already realizing simpler operations thanks to Autopilot, allowing them to spend less time worrying about infrastructure, and more time building their core business.)

Back to Spot Pods… Autopilot is great for running stable, production-grade workloads thanks to its Pod-level SLA, a first for GKE. You might however have other types of workloads that don’t need this high level of reliability, for example fault-tolerant batch workloads, or dev/test clusters that can handle some disruption. Spot Pods give you a convenient and cost-effective way to run these kinds of workloads on GKE Autopilot. (GKE standard users can also take advantage of spot pricing by running their GKE clusters and node pools on Spot VMs.)

When you run your workloads with Spot Pods, you will receive a discount of between 60 to 91% off our regularly priced pods (see our pricing page for the current price). There is no hard limit to how long a Spot Pod can run, but they may be preempted and evicted at any time if the resources need to be reclaimed by the platform during times of high resource demand.

How Spot Pods work

Spot Pods run on spare compute capacity in Google Cloud, which allows you to use them at a lower price compared to regular Autopilot pods, for as long as compute resources are available. If Google Cloud needs the resources for other tasks, GKE evicts your Spot Pods with a grace period of 25s. By using a Kubernetes workload API like Deployment or Job, you can automatically redeploy your Spot Pods as soon as there’s available capacity, and they pick up right where they left off.

Spot Pods are available starting in GKE 1.21.4. To enable Spot Pods on your deployment, just add a node selector for cloud.google.com/gke-spot: “true”. Here’s an example Deployment that uses this node selector to enable Spot Pods:

When you ask for Spot Pods in this way, Autopilot automatically provisions nodes for them. Autopilot adds Kubernetes taints and tolerations so that your regular, critical Pods stay separated and don’t land on the same nodes as Spot Pods. All you need to do is request Spot Pods in your manifest — GKE handles the rest.

When GKE evicts a Spot Pod to reclaim capacity, your containers get a SIGTERM signal and get up to 25s to wrap up their work. Make the most of this by adding terminationGracePeriodSeconds to your PodSpec, and gracefully shut your container down when it receives the SIGTERM signal.

Use Spot Pods to maximize your savings when you run fault-tolerant workloads on Autopilot clusters. For your regular Pods, you can also take advantage of Autopilot committed use discounts (CUDs), which launched earlier this year, and offer discounts of up to 45%. CUDs don’t apply to Spot Pods, which are already heavily discounted, but they do offer a convenient way to save money on pods that require a more stable environment. Regardless of your workload, GKE gives you a way to save.

Spot Pods are in Preview, and available starting with GKE version 1.21.4. To get started with Spot Pods for GKE Autopilot, read the documentation for Spot Pods, and create an Autopilot cluster in the Rapid release channel. For more such capabilities register to join us live on Nov 18th for Kubernetes Tips and Tricks to Build and Run Cloud Native Apps.

Cloud BlogRead More

Previous articleJOIN 2021: Sharing our product vision with the Looker community

Next articleUsing Google Cloud Vision API from within a Data Fusion Pipeline

Announcing Spot Pods for GKE Autopilot—save on fault tolerant workloads

How Spot Pods work

Leverage enterprise data with Denodo and Vertex AI for generative AI applications

TypeScript takes aim at truthy and nullish bugs

Hex-LLM: High-efficiency large language model serving on TPUs in Vertex AI Model Garden

LEAVE A REPLY Cancel reply

Most Popular

Schneider Electric automates Salesforce account hierarchy management with generative artificial intelligence (AI) using Amazon Aurora and Amazon Bedrock

Leverage enterprise data with Denodo and Vertex AI for generative AI applications

TypeScript takes aim at truthy and nullish bugs

Make relevant movie recommendations using Amazon Neptune, Amazon Neptune Machine Learning, and Amazon OpenSearch Service

Recent Comments

EDITOR PICKS

Exploring the Click Element Variable in Google Tag Manager

How to track events with Google Tag Manager and Google Analytics

Data Layer Variable in GTM: What, Why, and Where?

POPULAR POSTS

SAS INDEX Function: Learn with Examples

Bringing Kubernetes’ goodness to Windows Server apps with Anthos

Alicia Abella has a PhD and 29 patents to her name — now she’s helping telcos transform and creating opportunities for others to join...

POPULAR CATEGORY