GKE Autopilot mode gets burstable workloads and smaller Pod sizes

By mullaned2002

April 3, 2024

107

Autopilot mode for Google Kubernetes Engine (GKE) is our take on a fully managed, Pod-based Kubernetes platform. It provides category-defining features with a fully functional Kubernetes API with support for StatefulSet with block storage, GPU and other critical functionality that you don’t often found in nodeless/serverless-style Kubernetes offerings, while still offering a Pod-level SLA and a super-simple developer API.

But until now, Autopilot, like other products in this category, did not offer the ability to temporarily burst CPU and memory resources beyond what was requested by the workload. I’m happy to announce that now, powered by the unique design of GKE Autopilot on Google Cloud, we are able to bring burstable workload support to GKE Autopilot.

Bursting allows your Pod to temporarily utilize resources outside of those resources that it requests and is billed for. How does this work, and how can Autopilot offer burstable support, given the Pod-based model? The key is that in Autopilot mode we still group Pods together on Nodes. This is what powers several unique features of Autopilot such as our flexible Pod sizes. With this change, the capacity of your Pods is pooled, and Pods that set a limit higher than their requests can temporarily burst into this capacity (if it’s available).

With the introduction of burstable support, we’re also introducing another groundbreaking change: 50m CPU Pods — that is, Pods as small as 1/20th of a vCPU. Until now, the smallest Pod we offered was ¼ of a vCPU (250m CPU) — five times bigger. Combined with burst, the door is now open to run high-density-yet-small workloads on Autopilot, without constraining each Pod to its resource requests.

We’re also dropping the 250m CPU resource increment, so you can create any size of Pod you like between the minimum to the maximum size, for example, 59m CPU, 302m, 808m, 7682m etc (the memory to CPU ratio is automatically kept within a supported range, sizing up if needed). With these finer-grained increments, Vertical Pod Autoscaling now works even better, helping you tune your workload sizes automatically. If you want to add a little more automation to figuring out the right resource requirements, give Vertical Pod Autoscaling a try!

Here’s what our customer Ubie had to say:

“GKE Autopilot frees us from managing infrastructure so we can focus on developing and deploying our applications. It eliminates the challenges of node optimization, a critical benefit in our fast-paced startup environment. Autopilot’s new bursting feature and lowered CPU minimums offer cost optimization and support multi-container architectures such as sidecars. We were already running many workloads in Autopilot mode, and with these new features, we’re excited to migrate our remaining clusters to Autopilot mode.” – Jun Sakata, Head of Platform Engineering, Ubie

A new home for high-density workloads

Autopilot is now a great place to run high-density workloads. Let’s say for example you have a multi-tenant architecture with many smaller replicas of a Pod, each running a website that receives relatively little traffic, but has the occasional spike. Combining the new burstable feature and the lower 50m CPU minimum, you can now deploy thousands of these Pods in a cost-effective manner (up to 20 per vCPU), while enabling burst so that when that traffic spike comes in, that Pod can temporarily burst into the pooled capacity of these replicas.

What’s even better is that you don’t need to concern yourself with bin-packing these workloads, or the problem where you may have a large node that’s underutilized (for example, a 32 core node running a single 50m CPU Pod). Autopilot takes care of all of that for you, so you can focus on what’s key to your business: building great services.

Calling all startups, students, and solopreneurs

Everyone needs to start somewhere, and if Autopilot wasn’t the perfect home for your workload before, we think it is now. With the 50m CPU minimum size, you can run individual containers in us-central1 for under $2/month each (50m CPU, 50MiB). And thanks to burst, these containers can use a little extra CPU when traffic spikes, so they’re not completely constrained to that low size. And if this workload isn’t mission-critical, you can even run it in Spot mode, where the price is even lower.

In fact, thanks to GKE’s free tier, your costs in us-central1 can be as low as $30/month for small workloads (including production-grade load balancing) while tapping into the same world-class infrastructure that powers some of the biggest sites on the internet today. Importantly, if your startup grows, you can scale in-place without needing to migrate — since you’re already running on a production-grade Kubernetes cluster. So you can start small, while being confident that there is nothing limiting about your operating environment. We’re counting on your success as well, so good luck!

And if you’re learning GKE or Kubernetes, this is a great platform to learn on. Nothing beats learning on real production systems — after all, that’s what you’ll be using on the job. With one free cluster per account, and all these new features, Autopilot is a fantastic learning environment. Plus, if you delete your Kubernetes workload and networking resources when you’re done for the day, any associated costs cease as well. When you’re ready to resume, just create those resources again, and you’re back at it. You can even persist state between learning sessions by deleting just the compute resources (and keeping the persistent disks). Don’t forget there’s a $300 free trial to cover your initial usage as well!

Next steps:

Learn about Pod bursting in GKE.Read more about the minimum and maximum resource requests in Autopilot mode.Learn how to set resource requirements automatically with VPA.Try GKE’s Autopilot mode for a workload-based API and simpler Day 2 ops with lower TCO.Going to Next ‘24? Check out session DEV224 to hear Ubie talk about how it uses burstable workloads in GKE.

Cloud BlogRead More

Previous articleSeamlessly transition between no-code and code-first machine learning with Amazon SageMaker Canvas and Amazon SageMaker Studio

Next articleLegacy Integration vs. Generative Integration: Integration Strategies for 2024 and Beyond

GKE Autopilot mode gets burstable workloads and smaller Pod sizes

A new home for high-density workloads

Calling all startups, students, and solopreneurs

Leverage enterprise data with Denodo and Vertex AI for generative AI applications

TypeScript takes aim at truthy and nullish bugs

Hex-LLM: High-efficiency large language model serving on TPUs in Vertex AI Model Garden

LEAVE A REPLY Cancel reply

Most Popular

Schneider Electric automates Salesforce account hierarchy management with generative artificial intelligence (AI) using Amazon Aurora and Amazon Bedrock

Leverage enterprise data with Denodo and Vertex AI for generative AI applications

TypeScript takes aim at truthy and nullish bugs

Make relevant movie recommendations using Amazon Neptune, Amazon Neptune Machine Learning, and Amazon OpenSearch Service

Recent Comments

EDITOR PICKS

Exploring the Click Element Variable in Google Tag Manager

How to track events with Google Tag Manager and Google Analytics

Data Layer Variable in GTM: What, Why, and Where?

POPULAR POSTS

Use Amazon DynamoDB incremental exports to drive continuous data retention

How Striim Extends Azure Synapse Link

BigQuery row-level security enables more granular access to data

POPULAR CATEGORY