Load testing I/O Adventure with Cloud Run

By mullaned2002

November 30, 2022

466

In 2021 and 2022, Google I/O invited remote attendees to meet each other and explore a virtual world in the I/O Adventure online conference experience powered by Google Cloud technologies. (For more details on how this was done, see my previous post.)

When building an online experience like I/O Adventure, it’s important to decide on a provisioning strategy early on. Before we could make that decision, however, we needed to have a reasonably accurate estimate of how many attendees to expect.

Estimating the actual number of attendees in advance is easier for an in-person event than it is for a (free) online event. We recognized that this number could vary wildly from our best guesses, in either direction. The only safe strategy was to design a server architecture that would be able to handle much more traffic than we actually expected. Since the I/O Adventure experience was going to be live for only a few days during the conference, we determined that it would be affordable to overprovision by spinning up many server instances before the event started.

To further ensure that heavy traffic would not degrade the attendees’ experience, we decided to implement a queue. Our system would welcome as many potential simultaneous attendees as it could smoothly support, and steer additional users to a waiting queue. In the unlikely event that the actual traffic exceeded the large allocated capacity, the queue would prevent the system from becoming overly congested.

Designing a scalable cloud architecture for our project was one thing. Making sure that it would actually be able to support a heavy load is quite another! This post describes how we performed load testing on the cloud backend as a whole system, rather than on individual cloud components such as a single VM.

When load testing an entire cloud backend, you need to address several concerns that are not necessarily accounted for in component-level testing. Quotas are a good example. Quota issues are difficult to foresee before you actually hit them. And, we needed to consider many quotas! Do we have enough quota to spin up 200 VMs? More specifically, do we have enough for the type of machine (E2, N2, etc.) that we use? And even more specifically, in the cloud region where the project is deployed?

Pouring thousands of bots in the I/O Adventure world

Infrastructure

To design the load test and simulate thousand of attendees, we had to take two key factors into account:

Attendees communicate from their browser with the cloud backend using WebSocketsA typical attendee session lasts for at least 15 minutes without disconnecting, which is long-lived compared to some common load testing methodologies more focused on individual HTTP requests

I/O Adventure Attendee-Server communication between attendee’s browser and a GKE server pod, using WebSockets

While it is possible to set up a load test suite by provisioning and managing VMs, I have a strong preference for serverless solutions like Cloud Functions and Cloud Run. So, I wanted to know if we could use one of them to simulate user sessions – to essentially play the role of a load injector. Does Google Cloud’s serverless infrastructure support the necessary protocol – WebSockets – for this use case?

Yes! It turns out that Cloud Run supports WebSockets for egress and ingress, and has a configurable per-request timeout up to 1 hour.

Load test Client-Server communication using Cloud Run communicating via WebSockets with GKE pods

In the load test we mimicked a typical attendee session, in which a WebSocket connection transmits thousands of messages over several minutes, without disconnecting.

Concurrency

On the backend, the I/O Adventures servers handle thousands of simultaneous attendees by:

Accepting 500 attendees in each “shard”, where each shard is a server representing a part of the whole conference world, in which attendees can interact with each other;

Having hundreds of independent, preprovisioned shards;

Running several shards in each GKE Node;

Routing each incoming attendee to a shard with free capacity.

On the client side (the load injector), we implemented multiple levels of concurrency:

Each trigger (e.g. an HTTPS request from my workstation initiated with curl) can launch many concurrent sessions of 15 minutes and wait for their completion.

Each Cloud Run instance can handle many concurrent triggering requests (maximum concurrent requests per instance).

Cloud Run automatically starts new instances when the existing instances approach their full capacity. A Cloud Run service can scale to hundreds or thousands of container instances as needed.

We created an additional Cloud Run service specifically for triggering more simultaneous requests to the main Cloud Run injector as a way to amplify the load test.

Simulating a single attendee story

A simulated “user story” is a load test scenario that consists of logging in, being routed to the GKE pod of a shard, making a few hundred random attendee movements for 15 minutes, and disconnecting.

For this project, I ran the simulation in Cloud Run, and I kicked off the test by issuing a curl command from my laptop. In this setup, the scenario (story) initiates a connection as a WebSocket client, and the pods are WebSocket servers.

Injecting one attendee stroy

Initiating many attendee stories with one trigger

We created an injector service handler (implemented as a Node.js script) to start many stories in parallel, and wait for their completion.

Injecting many attendee stories with one trigger

Injecting many attendee stories with several concurrent triggers

I can multiply the load by triggering the injector many times concurrently with curl, from a command line terminal:

code_block[StructValue([(u’code’, u’for i in {1..10}; do rn curl -X POST “https://fancy-load-test.run.app” &rndonernwait’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e8efbfe3890>)])]

Cloud Run automatically scales up by spinning new injector machines (i.e Cloud Run instances) when needed.

Injecting attendee stories through many curl requests, triggering many Cloud Run injector instances (Login service details omitted for clarity.)

Injecting more attendee stories through an extra Cloud Run service

In the previous setup, my workstation became the bottleneck as I was launching too many long-lived trigger requests with curl. My workstation limited the number of TCP connections, and struggled to keep up with the CPU load for all the SSL handshakes.

We fixed this by creating a new intermediate service specifically for dealing with a large number of triggers. We tried several parameter values (number of stories per trigger, max requests per Cloud Run instance, etc.) to maximize the injector’s throughput.

Injecting attendee stories through an extra Cloud Run trigger service

Note the BigQuery component, used for log ingestion on both the injector side and the server side.

Measuring the success rate

Unlike HTTP requests, which have an explicit response code, WebSocket messages are unidirectional and by default don’t expect an acknowledgement.

To keep track of how many stories have run successfully to completion, we wrote a few events (login, start, finish…) to the standard output and activated a logging sink to stream all of the logs to BigQuery. The events were logged from the point of view of the clients (the injector) and from the point of view of the servers (GKE pods).

This made it very convenient, with aggregate SQL queries, to:

make sure that at least 99% of all the stories did finish successfully, and

make sure the stories did not take more time than expected.

We also kept an eye on several Grafana dashboards to monitor live metrics of our GKE cluster, and make sure that CPU, memory, and bandwidth resources didn’t get overwhelmed.

Visual check

As a bonus, it was very fun to connect as a “real” attendee and watch hundreds of bots running everywhere!

Connecting to the system under stress with a browser also enabled us to assess the subjective experience. We could see, for example, how smooth the animations were when a shard was hosting 500 attendees and the frontend was rendering dozens of moving avatars.

Results

With a total of 4000 triggers for 40 stories each, and a max concurrency of 40 requests per Cloud Run instance, our tests used just over 100 instances and successfully injected 160,000 simultaneous active attendees. We ran this load test script several times over a few days, for a total cost of about $100. The test took advantage of the full capacity of all of the server CPU cores (used by the GKE cluster) that our quota allowed. Mission accomplished!

We learned that:

The cost of the load test was acceptable.The quotas we needed to raise were the number of specific CPU cores and the number of external IPv4 addresses.Our platform would successfully sustain a load target of 160K attendees.

During the actual event, the peak traffic turned out to be less than the maximum supported load. (As a result, no attendees had to wait in the queue that we had implemented.) Following our tests, we were confident that the backend would handle the target load without any major issues, and it did.

Of course, Cloud Run and Cloud Run Jobs can handle many types of workloads, not only website backends and load tests. Take some time to explore them further and think about where you can put them to use in your own workflows!

Cloud BlogRead More

Previous articleMarketing Analytics With Google Cloud

Next articleHow CISOs get multicloud security right with CIEM

Load testing I/O Adventure with Cloud Run

Infrastructure

Concurrency

Measuring the success rate

Visual check

Results

Leverage enterprise data with Denodo and Vertex AI for generative AI applications

TypeScript takes aim at truthy and nullish bugs

Hex-LLM: High-efficiency large language model serving on TPUs in Vertex AI Model Garden

LEAVE A REPLY Cancel reply

Most Popular

Schneider Electric automates Salesforce account hierarchy management with generative artificial intelligence (AI) using Amazon Aurora and Amazon Bedrock

Leverage enterprise data with Denodo and Vertex AI for generative AI applications

TypeScript takes aim at truthy and nullish bugs

Make relevant movie recommendations using Amazon Neptune, Amazon Neptune Machine Learning, and Amazon OpenSearch Service

Recent Comments

EDITOR PICKS

Exploring the Click Element Variable in Google Tag Manager

How to track events with Google Tag Manager and Google Analytics

Data Layer Variable in GTM: What, Why, and Where?

POPULAR POSTS

AutoGPT Explained: Everything You Need To Know

Surviving in the R Environment

Introducing Pay-as-you-go pricing for Apigee API Management

POPULAR CATEGORY