BigQuery is a serverless and cost-effective enterprise data warehouse that works across cloud environments and scales with your data. As with any large scale data-intensive platform, following best practices and avoiding inefficient anti-patterns goes a long way in terms of performance and cost savings.
Usually SQL optimization requires a significant time investment from engineers, who must read high-complexity queries, devise a variety of approaches to improve performance and efficiency, and test several optimization techniques. The best place to start is to fix anti-patterns, since this only requires easily applicable changes and provides significant performance improvements.
To facilitate the task of identifying and fixing said anti-pattern, Google Professional Services Organization (PSO) and Global Services Delivery (GSD) have developed a BigQuery anti-pattern recognition tool. This tool automates the process of scanning SQL queries, identifying antipatterns, and providing optimization recommendations.
What is the BigQuery anti-pattern recognition tool?
The BigQuery anti-pattern recognition tool let you easily identify performance impacting anti-patterns across a large number of SQL queries in a single go.
It utilizes ZetaSQL to parse BigQuery SQL queries into abstract syntax trees (AST) and then traverses the tree nodes to detect the presence of anti-patterns.
The tool takes a BigQuery SQL query as an input, such as:
And produces the output as:
It examines potential optimizations, including:
Selecting only the necessary columns
Handling multiple WITH-clause references
Addressing subqueries in filters with aggregations
Optimizing ORDER BY queries with LIMIT
Enhancing string comparisons
Improving JOIN patterns
Avoiding subquery aggregation in the WHERE clause
The solution supports reading from various sources, such as:
Command line
Local files
Cloud Storage files
Local folders
Cloud Storage folders
CSV (with one query per line)
INFORMATION_SCHEMA
Additionally, the solution provides flexibility in writing output to different destinations, including:
Printing to the terminal
Exporting as CSV
Writing to a BigQuery table
Using the BigQuery anti-pattern recognition tool
The BigQuery anti-pattern recognition tool is hosted on GitHub. Below are the Quick Start steps on using the tool via command line for inline queries. You can also leverage Cloud Run to deploy it as a container on cloud.
Prerequisites
Linux OS
JDK 11 or above is installed
Maven
Docker
gcloud CLI
Quick start: – steps
1. Clone the repo into your local machine.
2. Build the tool image inside the `bigquery-antipattern-recognition` folder.
3. Run the tool for a simple inline query.
4. Below is the output result in the command-line interface:
Additionally, the above tool can read queries from Information Schema and load the output recommendations to a BigQuery table.
Below is an example of the BigQuery anti-pattern recognition tool results exported to a BigQuery table.
Getting started
Ready to start optimizing your BigQuery queries and cutting costs? Check out the tool here and contribute to the tool via GitHub.
Have questions or feedback?
We’re actively working on new features to make the tool as useful to our customers. Use it and tell us what you think! For product feedback/technical questions, reach out to us at [email protected]. If you’re already a BigQuery customer and would like a briefing on the tool, please reach out, we’d be happy to talk.
Cloud BlogRead More