Simplify troubleshooting in Google Kubernetes Engine with new playbooks

By mullaned2002

July 13, 2023

438

Here at Google Cloud we are always trying to find new ways to simplify how our customers troubleshoot. We’re excited to announce the introduction of a new troubleshooting experience: recommended interactive playbooks for Google Kubernetes Engine (GKE).

When dealing with issues that may be new to you, but that we’ve seen commonly in the past, these new playbooks can help you more quickly resolve issues and improve your Mean Time to Resolution, or MTTR.

Let’s take a quick look at one of these new example playbooks.

Let’s say we have a GKE cluster and an application requesting more resources than are available to it, such as memory or CPU. In that situation, it’s often the case that a Pod (or Pods) will be marked as ‘unschedulable’.

A Pod being marked as ‘unschedulable’ is a common issue and something we have documentedextensively, but let’s see how we can simplify the troubleshooting process.

In the screenshot below we’ve highlighted the notification from the cluster view that Pods are unschedulable.

If we click this notification we see a screen appear offering us a few ways to better understand this issue:

Clicking into the playbook, we can see a lot of information relevant to the issue at hand including relevant logs, metrics, and suggested next steps:

We can see from the logs and metrics that the Pods of the Deployment have requested more memory than is available, but that the node has ample resources available and there are no maximum limits on Pods being set. So to resolve this issue, we’ll need to modify the amount of memory the Pod requests, or increase the size of our cluster.

This dashboard is also customizable, so if you’d like you can add or remove components based on what’s most pertinent to you and your organization.

Finally, at the bottom of the playbook, under ‘Future Mitigation Tips’, you can also create an alert policy to look specifically for this issue:

When this alert fires, you’ll be able to acknowledge the incident or click the policy link to jump straight into this dashboard and begin troubleshooting:

This week we’re making two playbooks available: Unschedulable Pods, and a playbook for troubleshooting repeated attempts of a deployment crashing, commonly known as CrashLoopBackOff. We have playbooks for Memory and CPU scaling issues coming soon.

Both will appear as notifications for clusters where issues are present, and we hope this helps you in your troubleshooting journey! As always, if you have any questions or feedback on the product, please let us know by leaving feedback under the question mark icon of the page.

Cloud BlogRead More

Previous articleImprove training time of distributed machine learning with NCCL Fast Socket

Next articleReceive SNS notifications about Amazon RDS for SQL Server when database state changes to Offline or Online

Simplify troubleshooting in Google Kubernetes Engine with new playbooks

Leverage enterprise data with Denodo and Vertex AI for generative AI applications

TypeScript takes aim at truthy and nullish bugs

Hex-LLM: High-efficiency large language model serving on TPUs in Vertex AI Model Garden

LEAVE A REPLY Cancel reply

Most Popular

Schneider Electric automates Salesforce account hierarchy management with generative artificial intelligence (AI) using Amazon Aurora and Amazon Bedrock

Leverage enterprise data with Denodo and Vertex AI for generative AI applications

TypeScript takes aim at truthy and nullish bugs

Make relevant movie recommendations using Amazon Neptune, Amazon Neptune Machine Learning, and Amazon OpenSearch Service

Recent Comments

EDITOR PICKS

Exploring the Click Element Variable in Google Tag Manager

How to track events with Google Tag Manager and Google Analytics

Data Layer Variable in GTM: What, Why, and Where?

POPULAR POSTS

Hallo Germany! Berlin-Brandenburg Google Cloud region is now open

Run image segmentation with Amazon SageMaker JumpStart

Build a custom entity recognizer for PDF documents using Amazon Comprehend

POPULAR CATEGORY