Thursday, October 3, 2024
No menu items!
HomeCloud ComputingExtending BigQuery Functions beyond SQL with Remote Functions, now in preview

Extending BigQuery Functions beyond SQL with Remote Functions, now in preview

Today we are announcing the Preview of BigQuery Remote Functions. Remote Functions are user-defined functions (UDF) that let you extend BigQuery SQL with your own custom code, written and hosted in Cloud Functions, Google Cloud’s scalable pay-as-you-go functions as a service.  A remote UDF accepts columns from BigQuery as input, performs actions on that input using a Cloud Function, and returns the result of those actions as a value in the query result. With Remote Functions, you can now write custom SQL functions in Node.js, Python, Go, Java, NET, Ruby, or PHP. This ability means you can personalize BigQuery for your company, leverage the same management and permission models without having to manage a server.

In what type of situations could you use remote functions?

Before today, BigQuery customers had the ability to create user defined functions or UDFs in either SQL or javascript that ran entirely within BigQuery. While these functions are performant and fully managed from within BigQuery, customers expressed a desire to extend BigQuery UDFs with their own external code. Here are some examples of what they have asked for:

Security and Compliance: Use data encryption and tokenization services from the Google Cloud security ecosystem for external encryption and de-identification. We’ve already started working with key partners like Protegrity and Microstrategy on using these external functions as a mechanism to merge BigQuery into their security platform, which will help our mutual customers address strict compliance controls. Real Time APIs: Enrich BigQuery data using external APIs to obtain the latest stock price data, weather updates, or geocoding information.Code Migration: Migrate legacy UDFs or other procedural functions written in Node.js, Python, Go, Java, .NET, Ruby or PHP. Data Science: Encapsulate complex business logic and score BigQuery datasets by calling models hosted in Vertex AI or other Machine Learning platforms.

Getting Started

Let’s go through the steps to use a BigQuery remote UDF. 

Setup the BigQuery Connection:
   1. Create a BigQuery Connection 
     a. You may need to enable the BigQuery Connection API

Deploy a Cloud Function with your code:
   1. Deploying your Cloud Function
     a. You may need to enable Cloud Functions API
     b. You may need to enable Cloud Build APIs

   2. Grant the BigQuery Connection service account access to the Cloud Function
     a. One way you can find the service account is by using the bq cli show command

code_block[StructValue([(u’code’, u’bq show –location=US –connection $CONNECTION_NAME’), (u’language’, u”)])]

Define the BigQuery remote UDF: 
   1. Create the remote UDFs definition within BigQuery 
     a. One way to find the endpoint name is to use the gCloud cli functions describe command

code_block[StructValue([(u’code’, u’gcloud functions describe $FUNCTION_NAME’), (u’language’, u”)])]

Use the BigQuery remote UDF in SQL:
   1. Write a SQL statement as you would calling a UDF 
   2. Get your results! 

How remote functions can help you with common data tasks

Let’s take a look at some examples of how using BigQuery with remote UDFs can help accelerate development and enhance data processing and analysis.

Encryption and Decryption

As an example, let’s create a simple custom encryption and decryption Cloud Function in Python. 

The encryption function can receive the data and return an encrypted base64 encoded string. 

In the same Cloud Function, the decryption function can receive an encrypted base64 encoded string and return the decrypted string. A data engineer would be able to enable this functionality in BigQuery.

The Cloud Function receives the data and determines which function you want to invoke. The data is received as an HTTP request. The additional userDefinedContext fields allow you to send additional pieces of data to the Cloud Function.

code_block[StructValue([(u’code’, u’def remote_security(request):rn request_json = request.get_json()rn mode = request_json[‘userDefinedContext’][‘mode’]rn calls = request_json[‘calls’]rn not_extremely_secure_key = ‘not_really_secure’rn if mode == “encryption”:rn return encryption(calls, not_extremely_secure_key)rn elif mode == “decryption”:rn return decryption(calls, not_extremely_secure_key)rn return json.dumps({“Error in Request”: request_json}), 400′), (u’language’, u”)])]

The result is returned in a specific JSON formatted response that is returned to BigQuery to be parsed.

code_block[StructValue([(u’code’, u’def encryption(calls,not_extremely_secure_key):rn return_value = []rn for call in calls:rn data = call[0].encode(‘utf-8’)rn cipher = AES.new(rn not_extremely_secure_key.encode(‘utf-8′)[:16],rn AES.MODE_EAXrn )rn cipher_text = cipher.encrypt(data)rn return_value.append(rn str(base64.b64encode(cipher.nonce + cipher_text))[2:-1]rn )rn return json.dumps({“replies”: return_value})’), (u’language’, u”)])]

This Python code is deployed to Cloud Functions where it awaits to be invoked.

Let’s add the User Defined Function to BigQuery so we can invoke it from a SQL statement. The additional user_defined_context is what is sent to Cloud Functions as additional context in the request payloadso you can use multiple remote functions mapped to one endpoint.

code_block[StructValue([(u’code’, u’CREATE OR REPLACE FUNCTION `<project-id>.demo.decryption` (x STRING) RETURNS STRING REMOTE WITH CONNECTION `<project-id>.us.my-bq-cf-connection` OPTIONS (endpoint = ‘https://us-central1-<project-id>.cloudfunctions.net/remote_security’, user_defined_context = [(“mode”,”decryption”)])’), (u’language’, u”)])]

Once we’ve created our functions, users with the right IAM permissions can use them in SQL on BigQuery.

If you’re new to Cloud Functions, be aware that there are very minimal delays known as “cold starts”. 

The neat thing is you can call APIs as well, which is how our partners at Protegrity and Voltage enable their platforms to perform encryption and decryption of BigQuery data.

Calling APIs to enrich your data

Users, such as data analysts, can use the user defined functions created easily without needing other tools and moving the data out of BigQuery.

You can enrich your dataset with many more APIs, for example, the Google Cloud Natural Language API to analyze sentiment on your text without having to use another tool.

code_block[StructValue([(u’code’, u’def call_nlp(calls):rn return_value = []rn client = language_v1.LanguageServiceClient()rn for call in calls:rn text = call[0]rn document = language_v1.Document(rn content=text, type_=language_v1.Document.Type.PLAIN_TEXTrn )rn sentiment = client.analyze_sentiment(rn request={“document”: document}rn ).document_sentimentrn return_value.append(str(sentiment.score))rn return_json = json.dumps({“replies”: return_value})rn return return_json’), (u’language’, u”)])]

Once the Cloud Function is deployed and the remote UDF definition is created on BigQuery, you are able to invoke the NLP API and return the data from it for use in your queries.

Custom Vertex AI endpoint

Data Scientists can integrate Vertex AI endpoints and other APIs, all from the SQL console for custom models. 

Remember, the remote UDFs are meant for scalar executions.

You are able to deploy a model to a Vertex AI endpoint, which is another API, and then call that endpoint from Cloud Functions.

code_block[StructValue([(u’code’, u’def predict_classification(calls):rn # Vertex AI endpoint detailsrn client = aiplatform.gapic.PredictionServiceClient(client_options=client_options)rn endpoint = client.endpoint_path(rn project=project, location=location, endpoint=endpoint_idrn )rn # Call the endpoint for eachrn for call in calls:rn content = call[0]rn instance = predict.instance.TextClassificationPredictionInstance(rn content=content,rn ).to_value()rn instances = [instance]rn parameters_dict = {}rn parameters = json_format.ParseDict(parameters_dict, Value())rn response = client.predict(rn endpoint=endpoint, instances=instances, parameters=parametersrn )’), (u’language’, u”)])]

Related Article

Read Article

Cloud BlogRead More

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Most Popular

Recent Comments