Sunday, October 13, 2024
No menu items!
HomeCloud ComputingPreviewing the power of BigQuery Remote Functions for drive time optimization

Previewing the power of BigQuery Remote Functions for drive time optimization

BigQuery’s Remote Functions (in preview) make it possible to apply custom cloud functions to your warehouse without moving data or managing compute. This flexibility unlocks many use cases including data enrichment. In this post we demonstrate a pattern for combining BigQuery with the Google Maps API to add drive times to datasets containing origin and destination locations. This enrichment pattern is easily adapted for address geocoding or adding Google Map’s place descriptions to locations. After enriching location data further spatial analysis is possible such as route and drive time optimization or geographic clustering and cohort analysis

There are three steps to using BigQuery Remote Functions:

Create a Cloud Function in your favorite language. The function will be responsible for accepting a BigQuery request and returning an appropriate response.

Enable a BigQuery external connection and then register your remote function following these instructions. This step establishes the connection between BigQuery and your function.

Write a SQL query that executes your remote function.

Behind the scenes, BigQuery and Cloud Functions do the hard work of distributing your data, running your function in parallel, and returning your results. Developers supply functions that SQL analysts can use and scale to large datasets.

Drive Time Optimization Example

Many companies, from retailers to delivery services, face the daunting challenge of optimizing deliveries from warehouses to customers. While many databases can compute the geographic distance between a warehouse and a customer, a better metric to optimize is the driving distance or drive time between the two locations.

In this example we have a dataset with a table containing warehouse locations and a second table containing customer locations. In order to conduct analysis with drive time we need to enrich our data by calculating the drive times between warehouses and customers. This information is available through the Google Maps API and can be added to BigQuery by creating and calling a remote function. The following Python code demonstrates an example implementation.

NOTE: A real implementation will include input validation, error handling, and performance optimizations. This code is functional but simplified to highlight the distinct requirements of a BigQuery remote function.

code_block[StructValue([(u’code’, u’# Cloud functions can use existing librariesrnimport googlemapsrnrn# The entrypoint function handles accepts a BQ requestrn# calls the python function that returns drivetimern# and formats the results into a response for BQrnrndef drivetime_from_latlon(request):rn # initiate a googlemaps client usingrn # a runtime environment variable storing a cloud secretrn api_key = os.environ.get(‘MAPS_GEOCODE_API_KEY’)rn gmaps = googlemaps.Client(key=api_key)rn # process the BQ requestrn request_json = request.get_json()rn rows = request_json[‘calls’]rn # iterate through rows with two columns: origin, destinationrn return_value =[]rn for row in rows:rn return_value.append(get_drivetime(row[0], row[1], gmaps)) rn # format the result as BQ expectsrn replies = [float(x) for x in return_value]rn return_json = json.dumps( { “replies” : replies} rn return(return_json)rn# A standard python function to return drivetime rn# Other functions, such as routes and places, are availablern# The function could be modified to call the googlemaps API rn# with multiple values rndef get_drivetime(origin, destination, gmaps):rn results = gmaps.distance_matrix([origin], [destination])rn drivetime_sec = results[‘rows’][0][‘elements’][0][‘duration’][‘value’]rn return(json.dumps(drivetime_sec))’), (u’language’, u”)])]

The function is deployed as a Cloud Function and then connected to BigQuery. To do so, enable a BigQuery external connection and then register your remote function following these instructions. This step establishes the connection between BigQuery and your function.The remote function is displayed in BigQuery.

Once deployed, we can use the function in a SQL query to calculate the drive times. The SQL query can be executed on demand or scheduled. This example query uses the prior Python function to calculate the drive time between customers in Tennessee and a set of distribution centers across the Southeast. 

NOTE: To run this query you will need to create a dataset and register a remote function. Replace my_dataset.drivetime with the appropriate name of your dataset and remote function.

code_block[StructValue([(u’code’, u’WITH shipping AS (rntSELECT rnt CONCAT(users.first_name, ” “, users.last_name) AS destination_name,rnt CONCAT(‘{\”lat\”: ‘, users.latitude, rntt ‘,\”lng\”: ‘, users.longitude, ‘}’) AS destination,rnt dc.name AS origin_name,rnt CONCAT(‘{\”lat\”: ‘, dc.latitude, rntt ‘, \”lng\”: ‘, dc.longitude, ‘}’) AS originrn FROM `looker-private-demo`.thelook.usersrntCROSS JOIN `looker-private-demo`.thelook.distribution_centers AS dcrntWHERE users.state=”Tennessee” rn)rnSELECTrn my_dataset.drivetime(shipping.origin, shipping.destination) AS drivetime,rn shipping.destination_name,rn shipping.origin_namernFROM shipping rnORDER BY 1, 2′), (u’language’, u”)])]

This data can be used for further analysis such as:

Stocking distribution centers based on the number of customers best served by each warehouse.

Hiring drivers based on total required drive time at each warehouse.

Running what-if scenarios to calculate drive time saved for new potential warehouse locations.

Next Steps

The Google Maps API is only one example of what is possible with BigQuery remote functions. Review the preview documentation to get started, or learn about other examples of spatial analysis powered by BigQuery.

Related Article

Leveraging BigQuery Public Boundaries datasets for geospatial analytics

Here we’ll show you how to join first party data onto the BigQuery Public Boundaries Datasets for comprehensive geospatial analytics

Read Article

Cloud BlogRead More

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Most Popular

Recent Comments