You get what you pay for: Principles for designing a chargeback process

By mullaned2002

January 31, 2023

516

Principles for designing a chargeback process

As large organizations increase their cloud footprint, it becomes critical to ensure costs are being managed effectively. A good understanding of the cost of running each workload, and the value that workload brings to your business, allows organizations to have confidence in the efficiency of their cloud consumption — which is why we see many customers who are successful in driving financial accountability and cost efficiency embrace the Cloud FinOps framework.

One crucial capability of Cloud FinOps is chargeback, which is the process of mapping cloud consumption to internal consumers within your organization and facilitating recovery of cloud services costs. This empowers each individual team to be responsible for their cloud consumption, aligning incentives and providing transparency. We have seen customers successfully implement chargeback by combining detailed billing data exported from Cloud Billing with organizational data using the Business Intelligence tooling of their choice.

As a cloud architect, it’s possible — likely — that you’ve never built out a chargeback process before. This blog post will walk you through some best practices in designing an effective chargeback process in Google Cloud.

Determine the granularity of chargeback

Large organizations must choose the level of granularity with which to allocate cloud usage. Typically, usage costs are allocated per team or organization unit, though you may choose a more or less granular approach. There is no correct answer — you should allocate costs at whatever level fits your broader business objectives.

Differentiate between consumer workloads and platform services

Large organizations typically have two distinct types of workload — consumer workloads wholly owned by a single team, and platform workloads that provide multi-tenant services to internal customers. Cost allocation needs to be treated differently for each workload type. Typically, consumer workloads can be allocated entirely to a single team for chargeback, whereas platform services must be broken down by tenant consumption and allocated accordingly.

Label projects and resources to differentiate usage

Use labels or tags to differentiate between usage that should be allocated to different teams. Labels are key-value pairs that you can apply to organize resources. These labels are included in the Billing Export, so can be effectively used to allocate resource usage to cost centers or teams.

Use chargeback as an incentive to drive efficient spend

Designing a chargeback system for a large organization is ultimately about designing incentives for teams to monitor and proactively monitor their cloud spend. This lens should be used to make decisions about allocating usage of different types — if one team is allocated usage that a different team owns and runs, it is unlikely to result in an efficient use of resources, as the incentive to reduce spend has been removed.

Align attribution to product pricing structure

Google Cloud services are billed in different ways — some use all-inclusive usage based rates, while others have detailed SKU breakdowns for usage of different types. It is generally easiest and most effective to allocate usage for a given service in the same way it is billed. For example, Cloud Storage breaks down pricing into three components: data storage, data access, and network usage.

Use billing export to BigQuery

To group and allocate spend and merge it with your cost center data, you should enable Cloud Billing export to BigQuery. There are three types of reports available — standard usage cost data, detailed usage cost data, and pricing data. The detailed usage cost data is particularly useful for allocating detailed spend from platform services. The pricing data includes SKU-level pricing applicable to your Billing Account, and is useful for building workload price estimates with discounted custom pricing.

Decide which costs should be attributed

Not all costs need to be attributed — some organizations choose to allocate shared resources like network egress at the organization level to reduce complexity. Normally, the higher the proportion of costs that are allocated, the stronger the incentive is for teams to efficiently manage spend. However, choosing to allocate many shared resources adds complexity to your chargeback system.

BigQuery

BigQuery is a serverless data warehouse used by large organizations for cost-effective analytics, machine learning, and BI. Large organizations typically use flat-rate commitments for dedicated processing capacity. Flat-rate slot commitments provide stable monthly costs, but can make it tricky to attribute costs to internal consumers.

Clarify which billing model you use

BigQuery pricing is divided into two components: analysis pricing and storage pricing. Within analysis pricing, queries can be run inside of BigQuery via either on-demand pricing or flat-rate pricing. On-Demand pricing allows for a “pay as you use” model, which simplifies chargeback compared to the flat-rate pricing model where slots are purchased for a reservation and shared across all queries inside of that reservation. For details, see Organizing BigQuery resources.

Attribute flat-rate slot usage

If you use flat-rate pricing for BigQuery, chargeback for query jobs can be calculated using the total number of slot milliseconds used by each team’s or project’s query jobs. This data can be understood through either the “Analysis Slots Attribution” line item in the Cloud Billing data or the INFMRATION_SCHEMA tables within BigQuery itself. Choosing to chargeback reservation costs for total slot resources consumed will encourage teams to right-size their reservations and avoid relying on idle slot usage for production workloads.

Consider shared reservations

Most of the queries that an organization will run on BigQuery will be for analyzing data, known as query jobs. However, large organizations sometimes also use slots for load jobs to provide predictable performance. Some organizations also choose to use a “default” pool of slots at the top of their reservation hierarchy to provide some shared slots for larger queries. These types of slot reservations can only be attributed to internal teams on an individual query level. For simplicity, you might choose to attribute these reservations centrally rather than allocation them proportionally based on query resource usage.

Multi-tenant GKE clusters

Google Kubernetes Engine makes running enterprise-scale Kubernetes clusters simple, with built-in support for automatic provisioning and management and pod and cluster autoscaling. Large organizations often run multi-tenant GKE clusters, sometimes with several thousand nodes, to provide internal consumers with provisioned access to compute resources in the cloud. For these multi-tenant clusters, resource usage must be collected and exported before it can be attributed to internal consumers.

Choose how to differentiate usage

In Kubernetes clusters, individual pods make resource requests to the cluster that are used to make scheduling decisions. Once scheduled, actual resource usage can be tracked by the cluster. In multi-tenant clusters, you need a way to track the usage of workloads from different teams so that usage can be charged back. In GKE, you can differentiate resource usage by using Kubernetes namespaces, labels, or a combination of both.

Enable GKE Cost Allocation

Cost Allocation tracks information about the resource requests of the workloads running on your cluster. Currently, cost allocation collects information about Compute Engine VM Instance Core, RAM, and GPU requests in your cluster. This data is exported to BigQuery, where it can be queried and incorporated into your chargeback system. GKE cost allocation data is based on resource requests, not resources consumed. To export data about resources consumed in your GKE cluster, you can use cluster usage metering.

Map usage to cost information

Usage metering exports resource usage information, but does not include cost data. To query cost information broken down by namespace or label, join the exported resource usage data with the Google Cloud billing export data for each SKU. There are examples of these queries available here.

Decide which usage information to chargeback

Pods running in Kubernetes include three types of usage information — resource requests, resource limits, and resource usage. Requests represent resources requested by pods, and reserved by Kubernetes. If resource usage is charged back to consumers, pod resource requests may be set too high, leading to stranded resources and cluster under-utilization. For this reason, some customers choose to charge back resource requests, incentivizing consumers to right-size their workload requests.

Consider unallocated resources

Unallocated resources are cluster resources that remain unused by any workload running on the cluster. Whether your cluster is statically provisioned or uses autoscaling to vertically scale, there will be some unallocated resources in the cluster. These unallocated resources are included in GKE usage metering, and can be allocated to a central team or proportionally spread across all teams using the cluster.

Compute Engine commitments

Compute Engine provides customers with secure and customizable virtual machines on Google’s infrastructure, and access to a wide variety of virtual machine types for workloads of different types. Larger organizations typically have a predictable and consistent baseline use of resources in Compute Engine, and use committed use discounts to reduce cost. Committed use discounts offer steep discounts for continuous usage, but can complicate the process of attributing cost to internal consumers.

Use committed use discount sharing

By default, committed use discounts are applied only to the project where they are purchased. To maximize cost savings we see many organizations enable CUD-sharing that enables CUD coverage across all projects linked to your Cloud BIlling account. This will allow usage from any project matching the committed resources to be attributed to the commitment, increasing commitment utilization and reducing overall costs.

Use commitment attribution

Attribution refers to how resource benefits and commitment costs shared at the Cloud Billing account level are divided among projects. If commitments are left unattributed, subscription fees and credits are applied as projects consume eligible usage. If proportional attribution is used, the credits and subscription fees are applied in proportion with total eligible usage by each project. Finally, if prioritized attribution is used, credits and subscription fees are applied based on the distribution you specify. For more information, see attribution of committed use discount fees and credits.

Shared networking

Many large organizations use Dedicated Interconnect for private connectivity between their on-premises hosts and Google Cloud. This networking infrastructure is typically centralized and shared by internal teams, so must be apportioned in order to be charged back effectively.

Attribute shared networking charges

Dedicated Interconnect is billed on an hourly basis for both Interconnect connections and VLAN attachments, and usage is attributed to the project that owns the resource. There is also a charge for egress across the Interconnect, which is attributed to the project that owns the VLAN attachment. Depending on your networking setup, consumer projects may generate traffic that is routed on-premises across a Dedicated Interconnect, but may not own any Interconnect connections or VLAN attachments. In this case, you will need to manually attribute Dedicated Interconnect usage back to consumer projects to accurately reflect traffic usage.