Friday, March 1, 2024
No menu items!
HomeCloud ComputingCreate PromQL alerts in Cloud Monitoring now in Public Preview

Create PromQL alerts in Cloud Monitoring now in Public Preview

Last year, we introduced Managed Service for Prometheus so that you can scale your Prometheus environment more easily that hosting it yourself. As part of that announcement, we also added support for Prometheus’ popular PromQL query language in Cloud Monitoring. Since then, we’ve heard from experienced and new Prometheus users that they’d like to manage alerts in a single ecosystem. Today, we’re excited to announce that Cloud Monitoring now supports alerting using PromQL in Public Preview. You can now create globally scoped alerting policies based on PromQL queries alongside your Cloud Monitoring metrics and dashboards, without having to maintain backend services. 

In this release, you can:

Write globally scoped PromQL-based alerting policies in Cloud Monitoring 

Reference Prometheus, GCP system, and custom metrics in your alerting policy 

Route the notification to any Cloud Monitoring-supported notification channel. Use Email, Slack, SMS, and Mobile push to send the notification to your team members. Use Webhooks to send the notification to any public endpoint or Pub/Sub for any private endpoint. 

Customize the subject line in an Email notification channel 

Easily migrate your existing Prometheus alert_rules to Cloud Monitoring 

Manage your configuration with Terraform 

If you already have Prometheus alert rules, then you can migrate them to Cloud Monitoring alerting policies containing a PromQL query. You can also create your own PromQL alerting policies directly in Cloud Monitoring by using the Monitoring API or gCloud CLI.

Understanding alerting policies 

The following Prometheus alert rule triggers if your Kuberentes volume is at 90% of the available disk space. Let’s review a few of its fields:

Alert: Sets a name for the alert to help users identify what’s happening

Expr: The PromQL expression to evaluate. In this case, if the volume’s capacity exceeds 90%, then an alert will fire.  

For: Specifies the length of time during which each evaluation of the query must generate a `true` value before the alert fires.

Summary: The customized subject line to be used in the alert.

The labels and annotations fields provide additional information about the alert and can be used for passing additional context or actions.

code_block[StructValue([(u’code’, u’- alert: KubernetesVolumeOutOfDiskSpacern expr: kubelet_volume_stats_available_bytes/ kubelet_volume_stats_capacity_bytes * 100 < 10rn for: 120srn labels:rn severity: warningrn annotations:rn summary: Kubernetes Volume out of disk space (instance {{ $labels.instance }})rn description: “Volume is almost full (< 10% left)\n VALUE = {{ $value }}\n LABELS = {{ $labels }}”‘), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e16ac1b4050>)])]

Migrate alert rules to Cloud Monitoring 

For most Prometheus users, alert rules are stored in a Prometheus config file or rules file. You can now migrate these files to Cloud Monitoring alerting policies with PromQL queries. This way, you can store all of your alert policies in one place. Let’s look at a sample migration process:

The following PromQL expression sends an alert if any instance latency spikes above the 95th percentile during the weekdays:

code_block[StructValue([(u’code’, u’quantile by (instance)(0.95, avg by (instance)(sum by (instance)(rate(http_request_duration_seconds_sum[5m]))/sum by (instance)(rate(http_request_duration_seconds_count[5m])))) > 0.003 and on () (day_of_week() == 5 or day_of_week() > 1)’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e16ac1b4a90>)])]

The following Prometheus rules file sample shows the latency evaluation query and several other important fields:


code_block[StructValue([(u’code’, u’global:rn…rngroups:rn- name: example_alertrn rules:rn – alert: HTTP Response Request P95 Latency Exceeds Thresholdrn expr: quantile by (instance)(0.95, avg by (instance)(sum by (instance)(rate(http_request_duration_seconds_sum[5m]))/sum by (instance)(rate(http_request_duration_seconds_count[5m])))) > 1 and on () (day_of_week() == 5 or day_of_week() > 1)rn for: 2mrn labels:rn action_required: truern severity: criticalrn annotations:rn description: HTTP Response Request P95 Latency Exceeds Thresholdrn…’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e16ac1b4f50>)])]

You can migrate this alert rule by providing it to the gCloud migration command:

code_block[StructValue([(u’code’, u’$ gcloud alpha monitoring policies migrate –policies-from-prometheus-alert-rules-yaml=alerting_rules.yaml rnrnEach call of the migration tool will create a new set of alert policies and/or notification channels. Thus, the migration tool should not be used to update existing alert policies and/or notification channels.rnrnDo you want to continue (Y/N)? yrnrnCreated alert policy [projects/PROJECT_ID/alertPolicies/2057604254502431952].’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e16a634a7d0>)])]

Cloud Monitoring then creates an alerting policy that contains the PromQL query:

code_block[StructValue([(u’code’, u'{rn “name”: “projects/PROJECT_ID/alertPolicies/2057604254502431952”,rn “displayName”: “exampleAlert/HTTPsRequestP95LatencyExceedsThreshold”,rn “documentation”: {rn “content”: “HTTP Response Request P95 Latency Exceeds Threshold on Instance {{ $labels.instance }} down”,rn “mimeType”: “text/markdown”rn },rn “userLabels”: {},rn “conditions”: [rn {rn “name”: “projects/PROJECT_ID/alertPolicies/2057604254502431952/conditions/8617992930186250455”,rn “displayName”: “HTTPsRequestP95LatencyExceedsThreshold”,rn “conditionPrometheusQueryLanguage”: {rn “alertRule”: “HTTPsRequestP95LatencyExceedsThreshold”,rn “duration”: “120s”,rn “labels”: {rn “severity”: “critical”rn },rn “query”: “quantile by (instance)(0.95, avg by (instance)(sum by (instance)(rate(http_request_duration_seconds_sum[5m]))/sum by (instance)(rate(http_request_duration_seconds_count[5m])))) > 0.003 and on () (day_of_week() == 5 or day_of_week() > 1)”,rn “ruleGroup”: “exampleAlert”rn }rn }rn ],rn “alertStrategy”: {},rn “combiner”: “OR”,rn “enabled”: true,rn “notificationChannels”: [],rn “creationRecord”: {rn “mutateTime”: “2023-07-28T16:42:47.170885724Z”,rn “mutatedBy”: “ACCOUNT_ID”rn },rn “mutationRecord”: {rn “mutateTime”: “2023-07-28T16:42:47.170885724Z”,rn “mutatedBy”: “ACCOUNT_ID”rn }rn}’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e166f12ac10>)])]

These examples show how easy it is to get started with Cloud Monitoring’s new PromQL Alerting capabilities. You can migrate your existing Prometheus alert rules to Cloud Monitoring alert policies, or create PromQL alerting policies in Cloud Monitoring with the API or CLI.  

If you want to install alerts locally in your clusters and have your clusters route them to a Prometheus Alert Manager, then you can continue to use the Managed Service for Prometheus rule evaluation options for managed or self-deployed collection. Otherwise, you can reduce the time you spend on alert management by migrating your existing Prometheus alert rules to Cloud Monitoring alerting policies.

To learn more, check out our documentation:

For a general overview of PromQL Alerting, including a list of migration options and alerting rule-to-alerting policy field mapping, see Alerting policies with PromQL.

For detailed information about how to migrate alerting rules and receivers with the Google Cloud CLI, see Migrate alerting rules and receivers from Prometheus.

For a walkthrough of how to use Cloud Monitoring API to create alerting policies with a PromQL query, including several examples, see Create alerting policies with a PromQL query.

As always, please leave use feedback during the preview so we can improve the experience!

Cloud BlogRead More



Please enter your comment!
Please enter your name here

Most Popular

Recent Comments