Real-time analytics made easy with Cloud Spanner federated queries with BigQuery

By mullaned2002

August 28, 2021

1441

As enterprises compete for market share, their need for real-time insights has given rise to increased demand for transactional databases to support data analytics. Whether it’s to provide dashboards that inform rapid decision-making, to perform analysis on a lookup table stored in a transactional database, or to conduct complex hybrid transactional analytical workloads, there is a growing demand for analyzing large volumes of data in transactional databases.

Cloud Spanner is Google Cloud’s fully managed relational database for transactional workloads, and today, with the general availability of Spanner federated queries with BigQuery, it gets even more powerful. BigQuery is Google Cloud’s market-leading serverless, highly scalable, multi-cloud data warehouse that makes analytics easy by bringing together data from multiple sources. With Spanner’s BigQuery federation, you can query data residing in Spanner in real time without moving or copying the data, bridging the gap between operational data and analytics and creating a unified data lifecycle.

It’s already been possible to use tools like Dataflow to copy data from Spanner over to BigQuery, but if you haven’t set up these ETL workflows — or you simply need to do a quick lookup on data that’s in Spanner — you can now take advantage of BigQuery’s query federation support to run real-time queries on data that’s stored in Spanner.

In this post, we’ll look at how to set up a federated query in BigQuery that accesses data stored in Spanner.

How to run a federated query

Suppose you’re an online retailer that uses Spanner to store your shopping transactions, and you have a Customer 360 application built in BigQuery. You can now use federated queries to include this customer’s shopping transactions in your Customer 360 application without needing to copy the data over to BigQuery from Spanner.

To run a Customer 360 query in BigQuery that includes the shopping transactions that are stored in Spanner, follow these steps:

Launch BigQuery and choose the Google Cloud project that contains the Spanner instance that includes the shopping transactions database.

Set up an external data source for the Spanner shopping database in BigQuery. You’ll need to have bigquery.admin permissions to set this up.

Write a query in BigQuery that accesses the shopping data in the Spanner data source. If you’d like other users to access this external data source in BigQuery, simply grant them permission to use the connection resource you just created.

Setting up an external data source

To setup a Spanner external data source, select “Add data” and choose “External Data Source”

From here, add the connection settings for your Spanner database.

Writing a query that accesses the Spanner data source

Once you’ve created the external data source, it will be listed as one of the external connections in your project in the BigQuery Explorer.

Now, you simply use the EXTERNAL_QUERY function to send a query statement to Spanner, using Spanner’s native SQL dialect. The results are converted to BigQuery standard SQL data types and can be joined with other data in BigQuery.

Here’s the syntax for using EXTERNAL_QUERY:

SELECT * FROM EXTERNAL_QUERY(connection_id, external_database_query);

Connection_id(string): The name of the Spanner database connection resource. It is of the form: projects/projectID/locations/us/connections/ConnectionID. ConnectionID is the connection ID you created when you set up the external data source.

external_database_query (string): a read-only query in Spanner’s SQL dialect. The query is executed in Spanner and returned as a table in BigQuery.

For example:

And it’s as simple as that. There’s no need to copy data from Spanner over to BigQuery — EXTERNAL_QUERY takes care of everything. Moreover, EXTERNAL_QUERY returns a table that is no different from any other table in BigQuery, so you can JOIN to it, add it to materialized and authorized views, populate a dashboard with it, and even schedule the query using BigQuery’s scheduler.

Advantages of using Spanner federated queries with BigQuery

When you use query federation, the query is run by Spanner, and the results are returned to BigQuery as a table.

The data is accessed in real-time; there’s no need to wait for an ETL job to complete for the freshest of data.

The query will be executed by Spanner, so you’ll be able to take advantage of the same query optimizer and SQL capabilities that you’ll find in Spanner. You’ll also have the option to take advantage of Spanner’s PartitionQuery capabilities.

You’ll likely find EXTERNAL_QUERY is useful for lookups and simple analytics in Spanner tables rather than intensive analytics that would benefit from BigQuery’s strong analytic capabilities.

For more intensive analytics, you can still use EXTERNAL_QUERY to copy the data over to BigQuery rather than writing ETL jobs, and continue to run the analytics in BigQuery, such as the following:

Learn more

To get started with Spanner, create an instanceor try it out with a Spanner Qwik Start.

To learn more about Spanner federated queries with BigQuery, see the documentation.

Cloud BlogRead More

Previous articleReplicating from Cloud Spanner to BigQuery at scale

Next articleTowards a Reliable Device Management Platform

Real-time analytics made easy with Cloud Spanner federated queries with BigQuery

How to run a federated query

Setting up an external data source

Writing a query that accesses the Spanner data source

Advantages of using Spanner federated queries with BigQuery

Learn more

Leverage enterprise data with Denodo and Vertex AI for generative AI applications

TypeScript takes aim at truthy and nullish bugs

Hex-LLM: High-efficiency large language model serving on TPUs in Vertex AI Model Garden

LEAVE A REPLY Cancel reply

Most Popular

Schneider Electric automates Salesforce account hierarchy management with generative artificial intelligence (AI) using Amazon Aurora and Amazon Bedrock

Leverage enterprise data with Denodo and Vertex AI for generative AI applications

TypeScript takes aim at truthy and nullish bugs

Make relevant movie recommendations using Amazon Neptune, Amazon Neptune Machine Learning, and Amazon OpenSearch Service

Recent Comments

EDITOR PICKS

Exploring the Click Element Variable in Google Tag Manager

How to track events with Google Tag Manager and Google Analytics

Data Layer Variable in GTM: What, Why, and Where?

POPULAR POSTS

3 strategies for enterprise cloud finops

Product Scoop – November 2022

Fine-Tuning Stable Diffusion with LoRA

POPULAR CATEGORY

Real-time analytics made easy with Cloud Spanner federated queries with BigQuery

How to run a federated query

Setting up an external data source

Writing a query that accesses the Spanner data source

Advantages of using Spanner federated queries with BigQuery

Learn more

Registration is open for Google Cloud Next: October 12–14

LEAVE A REPLY Cancel reply

Most Popular

Recent Comments

EDITOR PICKS

POPULAR POSTS

POPULAR CATEGORY