Advancing systems research with open-source Google workload traces

By mullaned2002

May 5, 2022

361

With rapid expansion of internet and cloud computing, warehouse-scale computing (WSC) workloads (search, email, video sharing, online maps, online shopping, etc.) have reached planetary scale and are driving the lion’s share of growth in computing demand. WSC workloads also differ from others in their requirements for on-demand scalability, elasticity and availability.

Many studies (e.g., Profiling a warehouse-scale computer) and books (e.g., The Datacenter as a Computer: Designing Warehouse-Scale Machines) have pointed out that WSC workloads have fundamentally different characteristics than traditional benchmarks and require changes to modern computer architecture to achieve optimal efficiency. Google workloads have data and instruction footprints that go beyond the capacity of modern CPU caches, such that the CPU spends a significant portion of its time waiting for code and data. Simply increasing memory bandwidth would not solve the problem, as many accesses are in the critical path for application request processing; it is just as important to reduce memory access latency as it is to increase memory bandwidth.

Over the years, the computer architecture community has expressed the need for WSC workload traces to perform architecture research. Today, we are pleased to announce that we’ve published select Google workload traces. These traces will help systems designers better understand how WSC workloads perform as they interact with underlying components, and develop new solutions for front-end and data-access bottlenecks.

We captured these workload traces using DynamoRIO on computer servers running Google workloads — you can find more details at https://dynamorio.org/google_workload_traces.html. To protect user privacy, these traces only contain instruction and memory addresses.

We have found these traces useful for understanding WSC workloads and seeding internal research on processor front-ends, on-die interconnects, caches and memory subsystems, etc. — all areas that greatly impact WSC workloads. For example, we used these traces to develop AsmDB. Likewise, we hope these traces will enable the computer architecture community to develop new ideas that improve performance and efficiency of other WSC workloads.

Cloud BlogRead More

Previous articleOptimize and scale your startup on Google Cloud: Introducing the Build Series

Next articleSolving for food waste with data analytics in Google Cloud

Advancing systems research with open-source Google workload traces

Leverage enterprise data with Denodo and Vertex AI for generative AI applications

TypeScript takes aim at truthy and nullish bugs

Hex-LLM: High-efficiency large language model serving on TPUs in Vertex AI Model Garden

LEAVE A REPLY Cancel reply

Most Popular

Schneider Electric automates Salesforce account hierarchy management with generative artificial intelligence (AI) using Amazon Aurora and Amazon Bedrock

Leverage enterprise data with Denodo and Vertex AI for generative AI applications

TypeScript takes aim at truthy and nullish bugs

Make relevant movie recommendations using Amazon Neptune, Amazon Neptune Machine Learning, and Amazon OpenSearch Service

Recent Comments

EDITOR PICKS

Exploring the Click Element Variable in Google Tag Manager

How to track events with Google Tag Manager and Google Analytics

Data Layer Variable in GTM: What, Why, and Where?

POPULAR POSTS

How RallyPoint and AWS are personalizing job recommendations to help military veterans and service providers transition back into civilian life using Amazon Personalize

Prepare image data with Amazon SageMaker Data Wrangler

Run data science workloads without creating more data silos

POPULAR CATEGORY