Introducing automated failover for private workloads using Cloud DNS routing policies with health checks

By mullaned2002

October 24, 2022

467

High availability is an important consideration for many customers and we’re happy to introduce health checking for private workloads in Cloud DNS to build business continuity/disaster recovery (BC/DR) architectures. Typical BC/DR architectures are built using multi-regional deployments on Google Cloud. In a previous blog post, we showed how highly available global applications can be published using Cloud DNS routing policies. The globally distributed, policy-based DNS configuration provided reliability, but in case of a failure, it required manual intervention to update the geo-location policy configuration. In this blog we will use Cloud DNS health check support for Internal Load Balancers to automatically failover to health instances.

We will use the same setup we used in the previous blog. We have an internal knowledge-sharing web application. It uses a classic two-tier architecture: front-end servers tasked to serve web requests from our engineers and back-end servers containing the data for our application.

Our San Francisco, Paris, and Tokyo engineers will use this application, so we decided to deploy our servers in three Google Cloud regions for better latency, performance, and lower cost.

High level design

The wiki application is accessible in each region via an Internal Load Balancer (ILB). Engineers use the domain name wiki.example.com to connect to the front-end web app over Interconnect or VPN. The geo-location policy will use the Google Cloud region where the Interconnect or VPN lands as the source for the traffic and look for the closest available endpoint.

DNS resolution based on the location of the user

With the above setup, if our application in one of the regions goes down, we have to manually update the geo-location policy and remove the affected region from the configuration. Until someone detects the failure and updates the policy, the end users close to that region will not be able to reach the application. Not a great user experience. How can we design this better?

Google Cloud is introducing Cloud DNS health check support for Internal Load balancers. For an internal TCP/UDP load balancer, we can use the existing health checks for a back-end service, and Cloud DNS will receive direct health signals from the individual back-end instances. This enables automatic failover when the endpoints fail their health checks.

For example, if the US frontend service is unhealthy, Cloud DNS may return the closest region load balancer IP (in our example, Tokyo’s) to the San Francisco clients depending on the latency.

DNS resolution based on the location of the user and health of ILBs backends

Enabling the health checks for the wiki.example.com record provides us with automatic failover in case of a failure and ensures that Cloud DNS always returns only the healthy endpoints in response to the client queries. This removes manual intervention and significantly improves the failover time.

The Cloud DNS routing policy configuration would look like this:

Creating the Cloud DNS managed zone:

code_block[StructValue([(u’code’, u’gcloud dns managed-zones create wiki-private-zone \rn –description=”DNS Zone for the front-end servers of the wiki application” \rn –dns-name=wiki.example.com \rn –networks=prod-vpc \rn –visibility=private’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3eb68097ac10>)])]

Creating the Cloud DNS Record set:

For health checking to work, we need to reference the ILB using the ILB forwarding rule name. If we use the ILB IP instead, then Cloud DNS will not check the health of the endpoint.

See the official documentation page for more information on how to configure Cloud DNS routing policies with health checks.

code_block[StructValue([(u’code’, u’gcloud dns record-sets create front.wiki.example.com. \rn–ttl=30 \rn–type=A \rn–zone=wiki-private-zone \rn–routing-policy-type=GEO \rn–routing-policy-data=”us-west2=us-ilb-forwarding-rule;europe-west1=eu-ilb-forwarding-rule;asia-northeast1=asia-ilb-forwarding-rule” \rn–enable-health-checking’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3eb6782301d0>)])]

Note: Cloud DNS uses the health checks configured on the load balancers itself. Users do not need to configure any additional health checks for Cloud DNS. See the official documentation page for information on how to create health checks for GCP Load Balancers.

With this configuration, if we were to lose the application in one region due to an incident, the health checks on the ILB would fail, and Cloud DNS would automatically resolve new user queries to the next closest healthy endpoint.

We can expand this configuration to ensure that front-end servers send traffic only to healthy bank-end servers in the region closest to them.

We would configure front-end servers to connect to the global hostname backend.wiki.example.com.The Cloud DNS geo-location policy with health checks will use the front-end servers’ GCP region information to resolve this hostname to the closest available healthy back-end tier Internal Load Balancer.

Front-end to back-end communication (instance to instance)

Putting it all together, we now have set up our multi-regional and multi-tiered application with DNS policies to automatically failover to a healthy endpoint closest to the end user.

Cloud BlogRead More

Previous articleHow UX researchers make Google Cloud better with user feedback

Next articleSysdig’s new Cost Advisor aims to cut Kubernetes costs

Introducing automated failover for private workloads using Cloud DNS routing policies with health checks

The overwhelmed person’s guide to Google Cloud: week of April 18

How Can ISO 27001 Help You Comply With Data Protection?

Caliptra: Building trust, one chip at a time

LEAVE A REPLY Cancel reply

Most Popular

The overwhelmed person’s guide to Google Cloud: week of April 18

How Can ISO 27001 Help You Comply With Data Protection?

Reduce Amazon Aurora MySQL backup costs using MySQL Shell and Amazon S3

Caliptra: Building trust, one chip at a time

Recent Comments

EDITOR PICKS

Exploring the Click Element Variable in Google Tag Manager

How to track events with Google Tag Manager and Google Analytics

Data Layer Variable in GTM: What, Why, and Where?

POPULAR POSTS

Nvidia In the Lead in AI Chips and is Working to Stay There

AI² Workjams: Using AI to power faster, fairer recruitment

DataStax launches Astra Block to support Web3 applications

POPULAR CATEGORY