Traffic Director: TLS routing using Envoy gateway proxy on GKE

A common pattern for solving application networking challenges is to use a service mesh. Users familiar with such architectures face challenges related to network flow, security, and observability. Traffic Director is a managed Google service that helps solve these problems.

In this blog, we are going to share a sample architecture, using Traffic Director to configure gateway proxies, running on a GKE cluster, with TLS routing rules to route traffic from clients outside of the cluster to workloads deployed on the cluster.

Furthermore, we will demonstrate how north-south traffic could enter the service mesh using the Envoy proxy acting as an ingress gateway. Additionally, we will demonstrate the use of the service routing API to route such traffic and share some useful troubleshooting techniques.

The following diagram illustrates all the key components and traffic flows that are described in this blog.

1. An internal client will make a request to a sample application (e.g. https://www.example.com) and the request will be forwarded to an internal load balancer.

2. All requests to the internal load balancer are forwarded to pods containing Envoy proxies, running as a deployment of gateway proxies on a GKE cluster.

3. These Envoy proxies receive their configuration from the Traffic Director, as their control plane.

4. Based on the routing configuration received from the Traffic Director, the proxies will then route the incoming requests to the appropriate backend service for the target application.

5. The request will be handled by any of the pods associated with the backend service.

Backend service

In this section, we show how to deploy our target application and create the backend service, i.e. echo-service-neg that will be mapped with that application as depicted on the diagram above. First, let’s deploy the echo service. Notice the following annotation:

code_block[StructValue([(u’code’, u’cloud.google.com/neg: ‘{“exposed_ports”:{“8443”:{“name”: “echo-server-neg”}}}”), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e278b5f7210>)])]

It instructs Google Cloud to create a Network Endpoint Group, a NEG – one for each zone in the region. Later, these NEGs will be configured as backends for the Traffic Director.

Using routing information, i.e. scope_name and port 8443, Envoy gateway proxy routes requests to the correct NEG (step 5 on the architecture diagram). We use mendhak/http-https-echo:23 image as our backend. You can find a license for it here.

code_block[StructValue([(u’code’, u’apiVersion: v1rnkind: Servicernmetadata:rn name: echo-serverrn annotations:rn cloud.google.com/neg: ‘{“exposed_ports”:{“8443”:{“name”: “echo-server-neg”}}}’rn networking.gke.io/max-rate-per-endpoint: “1000”rnspec:rn ports:rn – port: 8443rn name: https-portrn protocol: TCPrn targetPort: 9999rn run: echo-apprn type: ClusterIPrn–rnapiVersion: apps/v1rnkind: Deploymentrnmetadata:rn labels:rn run: echo-apprn name: echo-apprnspec:rn replicas: 3rn selector:rn matchLabels:rn run: echo-apprn template:rn metadata:rn labels:rn run: echo-apprn spec:rn containers:rn – image: mendhak/http-https-echo:23rn name: echo-apprn env:rn – name: HTTPS_PORTrn value: “9999”rn ports:rn – protocol: TCPrn containerPort: 9999′), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e278b09bb90>)])]

Ingress Gateway

Traffic enters the service mesh through the ingress gateway (step 2). The Envoy proxy acts as such a gateway. The ingress gateway uses the SNI information in the request to perform routing. We will apply the Traffic Director configurations before deploying the ingress gateway pods. Although not a hard requirement, it does provide for a better user experience, i.e. you will see less corresponding errors in the Envoy logs.

Below are sample gcloud commands to deploy the health check and backend for Traffic Director. Take note of the name: td-tls-echo-service.

code_block[StructValue([(u’code’, u’# Enable APIsrngcloud services enable \rn trafficdirector.googleapis.com \rn networkservices.googleapis.comrnrn# This health check is associated with the Traffic Directorrngcloud compute health-checks create https td-https-health-check \rn –enable-logging \rn –use-serving-portrnrn# Create Backend Servicerngcloud compute backend-services create td-tls-echo-service \rn –global \rn –load-balancing-scheme=INTERNAL_SELF_MANAGED \rn –port-name=https \rn –health-checks td-https-health-check’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e278a12c310>)])]

Up to this point, we have created NEGs for the target echo application and the Traffic Director backend above. Let’s now add those NEGs as backends for the Backend Service.

code_block[StructValue([(u’code’, u’# Add our NEGs as backends to the Traffic Director backend servicern# Because we have created 1 NEG per each zone of the regional cluster, we add 3 NEGs (1 per zone).rnBACKEND_SERVICE=td-tls-echo-servicernECHO_NEG_NAME=echo-server-negrnMAX_RATE_PER_ENDPOINT=10rnrngcloud compute backend-services add-backend $BACKEND_SERVICE \rn –global \rn –network-endpoint-group $ECHO_NEG_NAME \rn –network-endpoint-group-zone us-central1-b \rn –balancing-mode RATE \rn –max-rate-per-endpoint $MAX_RATE_PER_ENDPOINTrnrngcloud compute backend-services add-backend $BACKEND_SERVICE \rn –global \rn –network-endpoint-group $ECHO_NEG_NAME \rn –network-endpoint-group-zone us-central1-a \rn –balancing-mode RATE \rn –max-rate-per-endpoint $MAX_RATE_PER_ENDPOINTrnrngcloud compute backend-services add-backend $BACKEND_SERVICE \rn –global \rn –network-endpoint-group $ECHO_NEG_NAME \rn –network-endpoint-group-zone us-central1-c \rn –balancing-mode RATE \rn –max-rate-per-endpoint $MAX_RATE_PER_ENDPOINT’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e278a12cd50>)])]

Note that the deployment for the ingress gateway below uses an initContainer running a bootstrap image, passing certain arguments allows this image to generate the Envoy bootstrap configuration; an alternative is to manually write the Envoy bootstrap configuration and store it in a ConfigMap and mount it in the Envoy container. Take note of –scope_name=gateway-proxy. As we will see later, scope_name associates our gateway with the service routing Gateway resource.

code_block[StructValue([(u’code’, u”apiVersion: apps/v1rnkind: Deploymentrnmetadata:rn name: ecom-envoy-gwrnspec:rn replicas: 1rn selector:rn matchLabels:rn app: ecom-gw-envoyrn template:rn metadata:rn labels:rn app: ecom-gw-envoyrn spec:rn containers:rn – env:rn – name: ENVOY_UIDrn value: ‘1337’rn image: envoyproxy/envoy:v1.21.2rn imagePullPolicy: Alwaysrn name: envoyrn resources:rn limits:rn memory: 1Girn cpu: ‘2’rn requests:rn memory: 512Mirn cpu: 500mrn volumeMounts:rn – mountPath: /etc/envoyrn name: envoy-bootstraprn initContainers:rn – args:rn – –project_number=112233445566rn – –scope_name=gateway-proxyrn – –bootstrap_file_output_path=/etc/envoy/envoy.yamlrn – –traffic_director_url=trafficdirector.googleapis.com:443rn image: gcr.io/trafficdirector-prod/xds-client-bootstrap-generator:v0.1.0rn imagePullPolicy: Alwaysrn name: td-bootstrap-writerrn volumeMounts:rn – mountPath: /etc/envoyrn name: envoy-bootstraprn volumes:rn – emptyDir: {}rn name: envoy-bootstrap”), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e279099b650>)])]

The following is the configuration for the service routing Gateway resource; notice the scope: gateway-proxy. Its value matches the one we specified in the YAML file above. Additionally, we set our target port to 8443. Essentially, we are handling the north-south traffic as shown here. We will talk more about routing in the next section.

Traffic Director, as a control plane, generates the corresponding configuration and pushes it to the Envoy (step 3), which is our data plane.

code_block[StructValue([(u’code’, u’# gateway that is listening for traffic on port 8443rnname: gateway8443rnscope: gateway-proxyrnports:rn- 8443rntype: OPEN_MESH’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e279099b750>)])]

Save the definition above in the gateway8443.yaml and import it using the following command.

code_block[StructValue([(u’code’, u’# Import gateway resourcerngcloud network-services gateways import gateway8443 \rn –source=gateway8443.yaml \rn –location=global’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e279099b6d0>)])]

In Cloud Operations Logging , you can run the query below to confirm the successful connection.

code_block[StructValue([(u’code’, u’logName=”projects/<project-id>/logs/trafficdirector.googleapis.com%2Fevents”rnjsonPayload.@type=”type.googleapis.com/google.networking.trafficdirector.type.TrafficDirectorLogEntry”rnjsonPayload.description=”Client connected successfully.”‘), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e276b5fabd0>)])]

It is useful for debugging to access the local admin interface of the Envoy by forwarding HTTP requests to the local machine. Execute the following from the command line: kubectl port-forward [YOUR POD] 15000. You should see output similar to these:

code_block[StructValue([(u’code’, u’Forwarding from 127.0.0.1:15000 -> 15000rnForwarding from [::1]:15000 -> 15000′), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e2789869f50>)])]

Open a browser window and go to http://localhost:15000. You should see the admin interface page, as shown below.

Click on the listeners link to see names of the listeners, and ports they are listening on, that were received from the Traffic Director. The config_dump link is useful to inspect the configuration that was pushed to this proxy from the Traffic Director. The stats link outputs endpoint statistics information that could be useful for debugging purposes. Explore this public documentation to learn more about Envoy administrative interface.

Routing

Let’s configure the routing. If you are not familiar with the Traffic Director service routing API, we recommend you read the Traffic Director service routing APIs overview to familiarize yourself with this API before you continue reading.

To use the service routing API in this tutorial, we need two resources, namely the Gateway resource and the TLSRoute resource. In the previous section we defined the Gateway resource. Below is the definition for TLSRoute resource.

code_block[StructValue([(u’code’, u’name: echo-tls-routerngateways:rn- projects/112233445566/locations/global/gateways/gateway8443rnrules:rn- matches:rn – sniHost:rn – example.comrn alpn:rn – h2rn action:rn destinations:rn – serviceName: projects/112233445566/locations/global/backendServices/td-tls-echo-service’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e27899fe450>)])]

Notice the gateway8443 in the gateways section. It matches the name of the gateway resource we defined in the Ingress Gateway section. Also notice the value of serviceName. This is the name of our Traffic Director backend service that was also created earlier.

Note that in the TLSRoute, host header matching is based on the SNI header of incoming requests. The sniHost value matches the domain, i.e. example.com. Additionally, the value h2 inside the alpn section, allows HTTP2 request matching only. Lastly, it will route all such requests to the backend specified with the service name – the Traffic Director (step 4 on the architecture diagram) we created.

Save the definition above in echo-tls-route.yaml file and import it using the following command.

code_block[StructValue([(u’code’, u’gcloud network-services tls-routes import ecom-echo-tls-route \rn –source=echo-tls-route.yaml \rn –location=global’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e27899fe5d0>)])]

Internal Load Balancer

The client connects to the internal load balancer (step 1 on the diagram). That is a regional Internal TCP/UDP Load Balancer. The load balancer (step 2) sends traffic to the Envoy proxy that acts as an ingress gateway. The load balancer is a passthrough one, making it possible to terminate the traffic in your backends.

The load balancer listens for incoming traffic on the port 443, routing requests to the GKE service. Notice the service definition below and pay attention to the ports config block. Next, the service directs requests from port 443 to port 8443 – a port exposed by our Envoy gateway deployment.

We have created the load balancer on Google Kubernetes Engine (GKE). See the tutorial for more details.

code_block[StructValue([(u’code’, u’apiVersion: v1rnkind: Servicernmetadata:rn annotations:rn networking.gke.io/load-balancer-type: Internalrn labels:rn app: ecom-gw-envoyrn name: ecom-ilbrnspec:rn type: LoadBalancerrn externalTrafficPolicy: Localrn ports:rn – name: httpsrn port: 443rn protocol: TCPrn targetPort: 8443rn selector:rn app: ecom-gw-envoy’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e2790745050>)])]

Firewall

To allow for network flow shown in the architecture diagram, you will need to configure following firewall rules. It is important to allow ingress from GKE nodes that terminate SSL workloads to the client-vm. Google’s Network Connectivity Center helped us to troubleshoot the connection and configure corresponding firewall rules.

In addition to the two firewall rules below that we created explicitly, please note that an automatically created GKE firewall rule is instrumental in allowing the Envoy pods to communicate with the application pods. If you would like to reproduce this deployment with pods sitting in different clusters, you will need to ensure that a firewall rule exists to allow pod to pod traffic across those clusters.

Validate the deployment

In this section, we will verify if we are able to communicate successfully to our backend service via Traffic Director. We create another VM (client-vm as depicted in diagram above) in a network that has connectivity in place with the internal load balancer that was created in section Create the Envoy gateway . Next, we run a simple curl command against our URL to verify connectivity:

code_block[StructValue([(u’code’, u’# Grab and store IP of the load balancer. rnIP=$(gcloud compute forwarding-rules list –format=”value(IP_ADDRESS.scope())”)rnrn# -k is to disable SSL validation.rncurl https://example.com –resolve example.com:443:$IP -k’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e278bf40310>)])]

Notice if you have more than one load balancer, you may apply a filter to the command that grabs and stores IP address in the variable above. See “Filtering and formatting with gcloud” blog for details.

Some important takeaways from this screenshot is that the Envoy “routed” our request and one of the pods that are behind the “echo” service handled the request. If you run kubectl get pods -o wide, you will be able to verify that the ip above matches the one of the Envoy gateway pod and the hostname of your echo pod matches the one shown on the screenshot.

Next Steps

Please refer to Traffic Director Service Routing setup guides to learn more about how to configure other types of routes apart from TLS routes.

Cloud BlogRead More

Traffic Director: TLS routing using Envoy gateway proxy on GKE

Backend service

Ingress Gateway

Routing

Internal Load Balancer

Firewall

Validate the deployment

Next Steps

Leverage enterprise data with Denodo and Vertex AI for generative AI applications

TypeScript takes aim at truthy and nullish bugs

Hex-LLM: High-efficiency large language model serving on TPUs in Vertex AI Model Garden

LEAVE A REPLY Cancel reply

Most Popular

Schneider Electric automates Salesforce account hierarchy management with generative artificial intelligence (AI) using Amazon Aurora and Amazon Bedrock

Leverage enterprise data with Denodo and Vertex AI for generative AI applications

TypeScript takes aim at truthy and nullish bugs

Make relevant movie recommendations using Amazon Neptune, Amazon Neptune Machine Learning, and Amazon OpenSearch Service

Recent Comments

EDITOR PICKS

Exploring the Click Element Variable in Google Tag Manager

How to track events with Google Tag Manager and Google Analytics

Data Layer Variable in GTM: What, Why, and Where?

POPULAR POSTS

How The Barcode Registry detects counterfeit products using object detection and Amazon SageMaker

Kubernetes cost management for the real world

How FinOps Tools Can Transform Your Cloud

POPULAR CATEGORY