BigQuery’s Remote Functions (in preview) make it possible to apply custom cloud functions to your warehouse without moving data or managing compute. This flexibility unlocks many use cases including data enrichment. In this post we demonstrate a pattern for combining BigQuery with the Google Maps API to add drive times to datasets containing origin and destination locations. This enrichment pattern is easily adapted for address geocoding or adding Google Map’s place descriptions to locations. After enriching location data further spatial analysis is possible such as route and drive time optimization or geographic clustering and cohort analysis
There are three steps to using BigQuery Remote Functions:
Create a Cloud Function in your favorite language. The function will be responsible for accepting a BigQuery request and returning an appropriate response.
Enable a BigQuery external connection and then register your remote function following these instructions. This step establishes the connection between BigQuery and your function.
Write a SQL query that executes your remote function.
Behind the scenes, BigQuery and Cloud Functions do the hard work of distributing your data, running your function in parallel, and returning your results. Developers supply functions that SQL analysts can use and scale to large datasets.
Drive Time Optimization Example
Many companies, from retailers to delivery services, face the daunting challenge of optimizing deliveries from warehouses to customers. While many databases can compute the geographic distance between a warehouse and a customer, a better metric to optimize is the driving distance or drive time between the two locations.
In this example we have a dataset with a table containing warehouse locations and a second table containing customer locations. In order to conduct analysis with drive time we need to enrich our data by calculating the drive times between warehouses and customers. This information is available through the Google Maps API and can be added to BigQuery by creating and calling a remote function. The following Python code demonstrates an example implementation.
NOTE: A real implementation will include input validation, error handling, and performance optimizations. This code is functional but simplified to highlight the distinct requirements of a BigQuery remote function.
The function is deployed as a Cloud Function and then connected to BigQuery. To do so, enable a BigQuery external connection and then register your remote function following these instructions. This step establishes the connection between BigQuery and your function.The remote function is displayed in BigQuery.
Once deployed, we can use the function in a SQL query to calculate the drive times. The SQL query can be executed on demand or scheduled. This example query uses the prior Python function to calculate the drive time between customers in Tennessee and a set of distribution centers across the Southeast.
NOTE: To run this query you will need to create a dataset and register a remote function. Replace my_dataset.drivetime with the appropriate name of your dataset and remote function.
This data can be used for further analysis such as:
Stocking distribution centers based on the number of customers best served by each warehouse.
Hiring drivers based on total required drive time at each warehouse.
Running what-if scenarios to calculate drive time saved for new potential warehouse locations.
Next Steps
The Google Maps API is only one example of what is possible with BigQuery remote functions. Review the preview documentation to get started, or learn about other examples of spatial analysis powered by BigQuery.
Cloud BlogRead More