Tuesday, September 17, 2024
No menu items!
HomeData Engineering and Data WarehousingDCPerf: An open source benchmark suite for hyperscale compute applications

DCPerf: An open source benchmark suite for hyperscale compute applications

We are open-sourcing DCPerf, a collection of benchmarks that represents the diverse categories of workloads that run in data center cloud deployments.
We hope that DCperf can be used more broadly by academia, the hardware industry, and internet companies to design and evaluate future products.
DCPerf is available now on GitHub.

Hyperscale and cloud datacenter deployments constitute the largest market share of server deployments in the world today. Workloads developed by large-scale internet companies running in their datacenters have very different characteristics than those in high performance computing (HPC) or traditional enterprise market segments. Therefore, server design considerations, trade-offs and objectives for datacenter use cases are also significantly different from other market segments and require a different set of benchmarks and evaluation methodology. Existing benchmarks fall short of capturing these characteristics and hence do not provide a reliable avenue to design and optimize modern server and datacenter designs.

Introducing DCPerf

Meta developed DCPerf, a collection of benchmarks to represent the diverse categories of workloads that run in cloud deployments. Each benchmark within DCPerf is designed by referencing a large application within Meta’s production server fleet. 

We used several new techniques to ensure benchmark representativeness, ranging from low-level hardware microarchitecture features to application and library usage profiles, to analyze production workloads and capture the important characteristics of these workloads in DCPerf. Designing and optimizing hardware and software on future server platforms using these benchmarks willmore closely translate into improved efficiency of hyperscaler  production deployments. 

DCPerf’s design process.

Over the past few years, we have continuously enhanced these benchmarks to make them compatible with different instruction set architectures, including x86 and ARM. We also validated that the benchmarks can be used to evaluate emerging industry trends, (e.g., chiplet-based architectures), and added support for multi-tenancy so that benchmarks can scale and make use of rapidly increasing core counts on modern server platforms. 

Using DCPerf to improve Meta’s compute server designs

We have been using DCPerf internally, in addition to the SPEC CPU benchmark suite, for product evaluation at Meta to make the right configuration choices for our data center deployments. DCPerf also helps us make early performance projections that are used for capacity planning, identify performance bugs in hardware and system software, and jointly optimize the platform with our hardware industry collaborators. 

DCPerf provides a much richer set of application software diversity and helps get better coverage signals on platform performance versus existing benchmarks such as SPEC CPU. Due to these benefits, we have also started using DCPerf to assist with our decision making process on which platforms to deploy in our data centers. 

DCPerf captures the core and SOC microarchitecture characteristics of data center applications. Graph compares  Instruction-Per-Cycle of production applications, DCPerf and SPEC CPU. Red circles highlight that DCPerf more accurately represents IPC of production applications.
DCPerf more closely captures the power and frequency characteristics of data center applications. This graph compares the average core frequency of production applications, DCPerf and SPEC CPU. Red circles highlight that DCPerf more accurately represents the frequency characteristics of production applications.

Improving state-of-the-art computing platforms with our hardware industry collaborators using DCPerf

Over the last two years we have collaborated with leading CPU vendors to further validate DCPerf on pre silicon and/or early silicon setups to debug performance issues and identify hardware and system software optimizations on their roadmap products. There have been multiple instances where we have been able to identify  performance optimizations in areas such as CPU core microarchitecture settings and SOC power management optimizations. 

The graphic below shows areas of HW/SW design where we have seen DCPerf being representative of production usage and being beneficial for delivering relevant performance signals and help with optimizations as well as areas of future work.

We are thankful for our collaborators’ support and contributions using DCPerf to drive innovation in such an important and complex area and expect to continue improving the benchmarks with new version releases over time to adapt to emerging technologies. 

Enabling innovations through open collaboration

Today, we are open-sourcing DCPerf with the goal to create a collaborative and open source reference benchmark that can be used to design, develop, debug, optimize, and improve state-of-the-art in compute platform designs for hyperscale. 

As an open source benchmark suite, DCPerf has the potential to become an industry standard method to capture important workload characteristics of compute workloads that run in hyperscale datacenter deployments. 

Get DCPerf on GitHub

DCPerf is available now on GitHub

The post DCPerf: An open source benchmark suite for hyperscale compute applications appeared first on Engineering at Meta.

Read MoreEngineering at Meta

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Most Popular

Recent Comments