Tuesday, October 8, 2024
No menu items!
HomeCloud ComputingAnalyze Pacemaker events in Cloud Logging

Analyze Pacemaker events in Cloud Logging

Customer’s deploying SAP on Google Cloud often leverage Pacemaker for high availability to support their most critical systems. Let’s take a look at how you can use Cloud Logging to easily conduct root cause analysis of Pacemaker clusters.

When there are multiple Pacemaker clusters running in Google Cloud platform, a central logging place can help to store the Pacemaker logs and offer an easy way to analyze Pacemaker events such as fencing or resource failover.

The Ops Agent is the primary agent for collecting telemetry from your Compute Engine instances. Combining logging and metrics into a single agent, the Ops Agent uses Fluent Bit for logs, which supports high-throughput logging, and the OpenTelemetry Collector for metrics.

Install the Agent

Follow this guide to install the Ops Agent on a single VM via command line or using the Google Cloud Console. To install the agent to multiple VMs, use gcloud or automation tools. Ensure your VM doesn’t have the legacy Cloud Logging Agent or Cloud Monitoring Agent installed on it.

Configure the Agent

By default, the Ops Agent’s build-in configuration collects file-based syslog log. Pacemaker resource agents such as SAPHana write logs to system log /var/log/messages in SAP certified OS SUSE and RedHat.

Add below configuration elements to the user configuration file /etc/google-cloud-ops-agent/config.yaml to stream Pacemaker logs to Cloud Logging. The path defined below covers all default log files that Pacemaker writes logs to in SUSE and RedHat.

Pacemaker-log is the receiver ID which defines the logName “projects/[PROJECT_ID]/logs/pacemaker-log” of the log entries streamed to Cloud Logging.

Note: If there are existing configurations defined in the logging section, then only add the bold parts.

code_block[StructValue([(u’code’, u’logging:rn receivers:rn pacemaker-log:rn type: filesrn include_paths: rn – /var/log/pacemaker.logrn – /var/log/cluster/corosync.logrn – /var/log/pacemaker/pacemaker.logrn service:rn pipelines:rn pacemaker-pipeline:rn receivers: [pacemaker-log]’), (u’language’, u”)])]

Restart the agent 

Restart the agent to apply the user-specified configuration

code_block[StructValue([(u’code’, u’sudo service google-cloud-ops-agent restart’), (u’language’, u”)])]

Validate the agent

Validate in logging module log 

/var/log/google-cloud-ops-agent/subagents/logging-module.log to ensure the Pacemaker logs are activated, you should see similar entries as below listing Pacemaker logs. Follow the troubleshooting guide for any issues.

code_block[StructValue([(u’code’, u'[2021/08/19 04:34:26] [ info] [sp] stream processor startedrn[2021/08/19 04:34:26] [ info] [input:tail:tail.3] inotify_fs_add(): inode=51595418 watch_fd=1 name=/var/log/cluster/corosync.logrn[2021/08/19 04:34:26] [ info] [input:tail:tail.0] inotify_fs_add(): inode=16814304 watch_fd=1 name=/var/log/messagesrn[2021/08/19 04:34:26] [ info] [input:tail:tail.2] inotify_fs_add(): inode=51506 watch_fd=1 name=/var/log/google-cloud-ops-agent/subagents/logging-module.logrn[2021/08/19 04:34:26] [ info] [input:tail:tail.3] inotify_fs_add(): inode=51595416 watch_fd=2 name=/var/log/pacemaker/pacemaker.log’), (u’language’, u”)])]

Validate cloud logging

Use below log filter (replace PROJECT_ID) in Cloud Logging Logs Explorer to validate the Pacemaker logs are being streamed there.

code_block[StructValue([(u’code’, u’logName=”projects/[PROJECT_ID]/logs/pacemaker-log”‘), (u’language’, u”)])]

Now you can use Cloud Logging Logs Explorer to analyze Pacemaker events.  Below sample log filter can help to filter the critical Pacemaker actions and events. Replace the INSTANCE_ID_NODE1/2 with the actual instance IDs of the two cluster nodes. The filter captures 

Actions of the cluster nodes, cluster resources such as start, stop or promote

Failed resource operations, such as start, stop or promote

Fencing actions, reasons (loss of cluster nodes, resource failure etc.) and results

Corosync communication errors

Cluster membership changes, member joins or leaves

code_block[StructValue([(u’code’, u’resource.type=”gce_instance”rn(resource.labels.instance_id=”[INSTANCE_ID_NODE1]” OR resource.labels.instance_id=”[INSTANCE_ID_NODE2]”)rn(protoPayload.methodName=”v1.compute.instances.reset”rnOR jsonPayload.message: (“LogNodeActions”rnOR (“remote_op_done” “Operation”)rnOR (“notice” “LogAction”)rnOR (“TOTEM” “failed” OR “membership” OR “Restransmit”)rnOR (“Result of” “operation” NOT “ok” NOT “Cancelled” NOT “probe”)rnOR (“SAPHana(” “WARNING” OR “ERROR”)rnOR (“SAPInstance(” OR “stonith-ng:” OR “gcp:stonith:” OR “gcp:alias:” OR “gcp-vpc-move-vip:” OR “fence_gce:” “ERROR” OR “Failed”)rn)rn)’), (u’language’, u”)])]

Now Pacemaker logs from all your clusters are stored in Cloud Logging, you can analyze Pacemaker events happening to any of your clusters in one central place. If further support is needed from Google Cloud Customer Care Team, efforts and time are saved to collect and transfer logs to the support agent.

To monitor Pacemaker clusters and receive alerts, read this blog What’s happening in your SAP systems? Find out with Pacemaker Alerts.

Related Article

What’s happening in your SAP systems? Find out with Pacemaker Alerts

The cluster alerting enables the system administrator to be notified about critical events of the enterprise workloads in GCP like the SA…

Read Article

Cloud BlogRead More

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Most Popular

Recent Comments