Create PromQL alerts in Cloud Monitoring now in Public Preview

By mullaned2002

August 10, 2023

281

Last year, we introduced Managed Service for Prometheus so that you can scale your Prometheus environment more easily that hosting it yourself. As part of that announcement, we also added support for Prometheus’ popular PromQL query language in Cloud Monitoring. Since then, we’ve heard from experienced and new Prometheus users that they’d like to manage alerts in a single ecosystem. Today, we’re excited to announce that Cloud Monitoring now supports alerting using PromQL in Public Preview. You can now create globally scoped alerting policies based on PromQL queries alongside your Cloud Monitoring metrics and dashboards, without having to maintain backend services.

In this release, you can:

Write globally scoped PromQL-based alerting policies in Cloud Monitoring

Reference Prometheus, GCP system, and custom metrics in your alerting policy

Route the notification to any Cloud Monitoring-supported notification channel. Use Email, Slack, SMS, and Mobile push to send the notification to your team members. Use Webhooks to send the notification to any public endpoint or Pub/Sub for any private endpoint.

Customize the subject line in an Email notification channel

Easily migrate your existing Prometheus alert_rules to Cloud Monitoring

Manage your configuration with Terraform

If you already have Prometheus alert rules, then you can migrate them to Cloud Monitoring alerting policies containing a PromQL query. You can also create your own PromQL alerting policies directly in Cloud Monitoring by using the Monitoring API or gCloud CLI.

Understanding alerting policies

The following Prometheus alert rule triggers if your Kuberentes volume is at 90% of the available disk space. Let’s review a few of its fields:

Alert: Sets a name for the alert to help users identify what’s happening

Expr: The PromQL expression to evaluate. In this case, if the volume’s capacity exceeds 90%, then an alert will fire.

For: Specifies the length of time during which each evaluation of the query must generate a `true` value before the alert fires.

Summary: The customized subject line to be used in the alert.

The labels and annotations fields provide additional information about the alert and can be used for passing additional context or actions.

code_block[StructValue([(u’code’, u’- alert: KubernetesVolumeOutOfDiskSpacern expr: kubelet_volume_stats_available_bytes/ kubelet_volume_stats_capacity_bytes * 100 < 10rn for: 120srn labels:rn severity: warningrn annotations:rn summary: Kubernetes Volume out of disk space (instance {{ $labels.instance }})rn description: “Volume is almost full (< 10% left)\n VALUE = {{ $value }}\n LABELS = {{ $labels }}”‘), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e16ac1b4050>)])]

Migrate alert rules to Cloud Monitoring

For most Prometheus users, alert rules are stored in a Prometheus config file or rules file. You can now migrate these files to Cloud Monitoring alerting policies with PromQL queries. This way, you can store all of your alert policies in one place. Let’s look at a sample migration process:

The following PromQL expression sends an alert if any instance latency spikes above the 95th percentile during the weekdays:

code_block[StructValue([(u’code’, u’quantile by (instance)(0.95, avg by (instance)(sum by (instance)(rate(http_request_duration_seconds_sum[5m]))/sum by (instance)(rate(http_request_duration_seconds_count[5m])))) > 0.003 and on () (day_of_week() == 5 or day_of_week() > 1)’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e16ac1b4a90>)])]

The following Prometheus rules file sample shows the latency evaluation query and several other important fields:

alerting_rules.yaml

code_block[StructValue([(u’code’, u’global:rn…rngroups:rn- name: example_alertrn rules:rn – alert: HTTP Response Request P95 Latency Exceeds Thresholdrn expr: quantile by (instance)(0.95, avg by (instance)(sum by (instance)(rate(http_request_duration_seconds_sum[5m]))/sum by (instance)(rate(http_request_duration_seconds_count[5m])))) > 1 and on () (day_of_week() == 5 or day_of_week() > 1)rn for: 2mrn labels:rn action_required: truern severity: criticalrn annotations:rn description: HTTP Response Request P95 Latency Exceeds Thresholdrn…’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e16ac1b4f50>)])]

You can migrate this alert rule by providing it to the gCloud migration command:

code_block[StructValue([(u’code’, u’$ gcloud alpha monitoring policies migrate –policies-from-prometheus-alert-rules-yaml=alerting_rules.yaml rnrnEach call of the migration tool will create a new set of alert policies and/or notification channels. Thus, the migration tool should not be used to update existing alert policies and/or notification channels.rnrnDo you want to continue (Y/N)? yrnrnCreated alert policy [projects/PROJECT_ID/alertPolicies/2057604254502431952].’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e16a634a7d0>)])]

Cloud Monitoring then creates an alerting policy that contains the PromQL query:

code_block[StructValue([(u’code’, u'{rn “name”: “projects/PROJECT_ID/alertPolicies/2057604254502431952”,rn “displayName”: “exampleAlert/HTTPsRequestP95LatencyExceedsThreshold”,rn “documentation”: {rn “content”: “HTTP Response Request P95 Latency Exceeds Threshold on Instance {{ $labels.instance }} down”,rn “mimeType”: “text/markdown”rn },rn “userLabels”: {},rn “conditions”: [rn {rn “name”: “projects/PROJECT_ID/alertPolicies/2057604254502431952/conditions/8617992930186250455”,rn “displayName”: “HTTPsRequestP95LatencyExceedsThreshold”,rn “conditionPrometheusQueryLanguage”: {rn “alertRule”: “HTTPsRequestP95LatencyExceedsThreshold”,rn “duration”: “120s”,rn “labels”: {rn “severity”: “critical”rn },rn “query”: “quantile by (instance)(0.95, avg by (instance)(sum by (instance)(rate(http_request_duration_seconds_sum[5m]))/sum by (instance)(rate(http_request_duration_seconds_count[5m])))) > 0.003 and on () (day_of_week() == 5 or day_of_week() > 1)”,rn “ruleGroup”: “exampleAlert”rn }rn }rn ],rn “alertStrategy”: {},rn “combiner”: “OR”,rn “enabled”: true,rn “notificationChannels”: [],rn “creationRecord”: {rn “mutateTime”: “2023-07-28T16:42:47.170885724Z”,rn “mutatedBy”: “ACCOUNT_ID”rn },rn “mutationRecord”: {rn “mutateTime”: “2023-07-28T16:42:47.170885724Z”,rn “mutatedBy”: “ACCOUNT_ID”rn }rn}’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e166f12ac10>)])]

These examples show how easy it is to get started with Cloud Monitoring’s new PromQL Alerting capabilities. You can migrate your existing Prometheus alert rules to Cloud Monitoring alert policies, or create PromQL alerting policies in Cloud Monitoring with the API or CLI.

If you want to install alerts locally in your clusters and have your clusters route them to a Prometheus Alert Manager, then you can continue to use the Managed Service for Prometheus rule evaluation options for managed or self-deployed collection. Otherwise, you can reduce the time you spend on alert management by migrating your existing Prometheus alert rules to Cloud Monitoring alerting policies.

To learn more, check out our documentation:

For a general overview of PromQL Alerting, including a list of migration options and alerting rule-to-alerting policy field mapping, see Alerting policies with PromQL.

For detailed information about how to migrate alerting rules and receivers with the Google Cloud CLI, see Migrate alerting rules and receivers from Prometheus.

For a walkthrough of how to use Cloud Monitoring API to create alerting policies with a PromQL query, including several examples, see Create alerting policies with a PromQL query.

As always, please leave use feedback during the preview so we can improve the experience!

Cloud BlogRead More

Previous articleSeven Real-Life Database Examples

Next articleMemorystore adds version support for Redis 7.0

Create PromQL alerts in Cloud Monitoring now in Public Preview

Understanding alerting policies

Migrate alert rules to Cloud Monitoring

Leverage enterprise data with Denodo and Vertex AI for generative AI applications

TypeScript takes aim at truthy and nullish bugs

Hex-LLM: High-efficiency large language model serving on TPUs in Vertex AI Model Garden

LEAVE A REPLY Cancel reply

Most Popular

Schneider Electric automates Salesforce account hierarchy management with generative artificial intelligence (AI) using Amazon Aurora and Amazon Bedrock

Leverage enterprise data with Denodo and Vertex AI for generative AI applications

TypeScript takes aim at truthy and nullish bugs

Make relevant movie recommendations using Amazon Neptune, Amazon Neptune Machine Learning, and Amazon OpenSearch Service

Recent Comments

EDITOR PICKS

Exploring the Click Element Variable in Google Tag Manager

How to track events with Google Tag Manager and Google Analytics

Data Layer Variable in GTM: What, Why, and Where?

POPULAR POSTS

Loss Functions in PyTorch Models

Upgrade Amazon Aurora PostgreSQL and Amazon RDS for PostgreSQL version 10

Product Scoop October 2021

POPULAR CATEGORY