Introducing partitioning and clustering recommendations for optimizing BigQuery usage

By mullaned2002

May 25, 2023

227

Do you have a lot of BigQuery tables? Do you find it hard to keep track of which ones are partitioned and clustered, and which ones could be? If so, we have good news. We’re launching a partitioning and clustering recommender that will do the work for you! The recommender analyzes all your organization’s workloads and tables and identifies potential cost optimization opportunities. And the best part is, it’s completely free!

“The BigQuery partitioning and clustering recommendations are awesome! They have helped our customers identify areas where they can reduce costs, improve performance, and optimize our BigQuery usage.” Sky, one of Europe leading media and communications companies

How does the recommender work?

Partitioning divides a table into segments, while clustering sorts the table based on user-defined columns. Both methods can improve the performance of certain types of queries, such as queries that use filter clauses and queries that aggregate data.

BigQuery’s partitioning and clustering recommender analyzes each project’s workload execution over the past 30 days to look for suboptimal scans of the table data. The recommender then uses machine learning to estimate the potential savings and generate final recommendations. The process has four key steps: Candidate Generation, Read Pattern Analyzer, Write Pattern Analyzer, and Generate Recommendations.

Candidate Generation is the first step in the process, where tables and columns are selected based on specific criteria. For Partitioning, tables larger than 100 Gb are chosen, and for Clustering tables larger than 10 Gb are chosen. The reason for filtering out the smaller tables is because the optimization benefit is smaller and less predictable. Then we identify columns that meet BigQuery’s partitioning and clustering requirements.

In the Read Pattern Analyzer step, the recommender analyzes the logs of queries that filter on the selected columns to determine their potential for cost savings through partitioning or clustering. Several metrics, such as filter selectivity, potential file pruning, and runtime, are considered, and machine learning is used to estimate the potential slot time saved if partitioning or clustering is applied.

The Write Pattern Analyzer step is then used to estimate the cost that partitioning or clustering may introduce during write time. Write patterns and table schema are analyzed to determine the net savings from partitioning or clustering for each column.

Finally, in Generate Recommendations, the output from both the Read Pattern Analyzer and Write Pattern Analyzer is used to determine the net savings from partitioning or clustering for each column. If the net savings are positive and meaningful, the recommendations are uploaded to the Recommender API with proper IAM permissions.

Discovering BigQuery partitioning and clustering recommendations

You can access these recommendations via a few different channels:

Via the lightbulb or idea icon in the top right of BigQuery’s UI page

On our console via the Recommendation Hub

Via our Recommender API

You can also export the recommendations to BigQuery using BigQuery Export.

To learn more about the recommender, please see the public documentation.

We hope you use BigQuery partitioning and clustering recommendations to optimize your BigQuery tables, and can’t wait to hear your feedback and thoughts about this feature. Please feel free to reach us at [email protected].

Cloud BlogRead More

Previous articleVertex AI Embeddings for Text: Grounding LLMs made easy

Next articleHow Apigee can help government agencies adopt Zero Trust

Introducing partitioning and clustering recommendations for optimizing BigQuery usage

How does the recommender work?

Discovering BigQuery partitioning and clustering recommendations

The overwhelmed person’s guide to Google Cloud: week of April 11

Meta Llama 3 Available Today on Google Cloud Vertex AI

Innovating in patent search: How IPRally leverages AI with Google Kubernetes Engine and Ray

LEAVE A REPLY Cancel reply

Most Popular

The overwhelmed person’s guide to Google Cloud: week of April 11

Three Important Considerations for Delivering a Data Mesh

Generate customized, compliant application IaC scripts for AWS Landing Zone using Amazon Bedrock

Meta Llama 3 Available Today on Google Cloud Vertex AI

Recent Comments

EDITOR PICKS

Exploring the Click Element Variable in Google Tag Manager

How to track events with Google Tag Manager and Google Analytics

Data Layer Variable in GTM: What, Why, and Where?

POPULAR POSTS

A simple automated build pipeline for Node.js

Reshaping Flipkart’s technological landscape with a mammoth cloud migration

How Spanner and BigQuery work together to handle transactional and analytical workloads

POPULAR CATEGORY