Introducing new SQL functions to manipulate your JSON data in BigQuery

By mullaned2002

August 8, 2023

403

Enterprises are generating data at an exponential rate, spanning traditional structured transactional data, semi-structured like JSON and unstructured data like images and audio. Beyond the scale of the data, these divergent types present processing challenges for developers, at times requiring separate processing flows for each. With its initial release BigQuery’s support for semi-structured JSON eliminated the need for such complex preprocessing and providing schema flexibility, intuitive querying and the scalability benefits afforded to structured data.

Today, we are excited to announce the release of new SQL functions for BigQuery JSON, extending the power and flexibility of our core JSON support. These functions make it even easier to extract and construct JSON data and perform complex data analysis.

With these new query functions, you can:

Convert JSON values into primitive types (INT64, FLOAT64, BOOL and STRING) in an easier and more flexible way with the new JSON LAX functions

Easily update and modify an existing JSON value in BigQuery with the new JSON mutator functions.

Construct JSON object and JSON array with SQL in BigQuery with the new JSON constructor functions.

Let’s review these new features and some examples of how to use them. First, we will create a table for demonstration.

code_block[StructValue([(u’code’, u’CREATE TABLE dataset_name.users_sample AS (rn SELECT JSON ‘{“name”: “Alice”, “age”: 28, “address”: {“country”: “USA”, “city”: “SF”, “zipcode”: 94102}}’ AS user UNION ALLrn SELECT JSON ‘{“name”: “Bob”, “age”: “40”, “address”: {“country”: “Germany”}}’ UNION ALLrn SELECT JSON ‘{“name”: “Charlie”, “age”: null, “address”: {“zipcode”: 12356, “country”: null}}’rn)’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3ea8d7b55a10>)])]

code_block[StructValue([(u’code’, u’Query:rnrn– Table contentsrnSELECT * FROM dataset_name.users_sample ORDER BY STRING(user.name);rnOutput:rn+———————————————————————————–+rn| user |rn+———————————————————————————–+rn| {“address”:{“city”:”SF”,”country”:”USA”,”zipcode”:94102},”age”:28,”name”:”Alice”} |rn| {“address”:{“country”:”Germany”},”age”:”40″,”name”:”Bob”} |rn| {“address”:{“country”:null,”zipcode”:12356},”age”:null,”name”:”Charlie”} |rn+———————————————————————————–+’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3ea8d7b55b90>)])]

Great! Let’s say we want to get a list of all users over 30. Looking at the table, you will see that user.age contains a JSON number in the first record, a JSON string in the second, and a JSON null in the third. With the new powerful LAX function, LAX_INT64, all types are automatically inferred and processed correctly.

code_block[StructValue([(u’code’, u’Query:rnrnrnSELECT user.name FROM dataset_name.users_samplernWHERE LAX_INT64(user.age) > 30rnOutput:rn+——-+rn| name |rn+——-+rn| “Bob” |rn+——-+’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3ea8d7b55dd0>)])]

Unlike the “strict” conversion functions, which require that the JSON type matches the primitive type exactly, the “lax” conversion functions will also handle conversions between mismatched data types.

For example, the strict conversion function below would return an error:

code_block[StructValue([(u’code’, u’Query:rnrnSELECT INT64(JSON ‘”10″‘) AS strict_int64rnOutput:rnError: The provided JSON input is not an integer’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3ea8d7c65ad0>)])]

However, the lax conversion function below would return the desired result:

code_block[StructValue([(u’code’, u’Query:rnrnSELECT LAX_INT64(JSON ‘”10″‘) AS lax_int64rnOutput:rn+———–+rn| lax_int64 |rn+———–+rn| 10 |rn+———–+’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3ea8d7c659d0>)])]

Furthermore, you can quickly and easily remove a field in the JSON data by using JSON_REMOVE functions.

code_block[StructValue([(u’code’, u’Query:rnrnrnUPDATE dataset_name.users_sample SET user = JSON_REMOVE(user, “$.address.zipcode”)rnWHERE truernAfter the query above; if you run the query u201cSELECT * FROM dataset_name.users_sample ORDER BY STRING(user.name);u201d, you will receive the following Output:rnrnrn+——————————————————————-+rn| user |rn+——————————————————————-+rn| {“address”:{“city”:”SF”,”country”:”USA”},”age”:28,”name”:”Alice”} |rn| {“address”:{“country”:”Germany”},”age”:”40″,”name”:”Bob”} |rn| {“address”:{“country”:null},”age”:null,”name”:”Charlie”} |rn+——————————————————————-+’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3ea8d7c65710>)])]

JSON_STRIP_NULLS compresses the data by removing JSON nulls. Although BigQuery null values neither impact performance nor storage cost, it can be helpful for reducing data size during exports.

code_block[StructValue([(u’code’, u’Query:rnrnrnUPDATE dataset_name.users_sample SET user = JSON_STRIP_NULLS(user, remove_empty=>true) WHERE truernAfter the query above; if you run the query u201cSELECT * FROM dataset_name.users_sample ORDER BY STRING(user.name);u201d, you will receive the following Output:rnrnrn+——————————————————————-+rn| user |rn+——————————————————————-+rn| {“address”:{“city”:”SF”,”country”:”USA”},”age”:28,”name”:”Alice”} |rn| {“address”:{“country”:”Germany”},”age”:”40″,”name”:”Bob”} |rn| {“name”:”Charlie”} |rn+——————————————————————-+’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3ea8d7b65b90>)])]

Now, what if we want to modify or add a field to the JSON data?

You can now update the data with the new JSON_SET function. And you have the ability to mix and match JSON functions together to achieve desired results.

For example, the query below adds a new field, “region_code”, to the table. The value of the field will be “America” if the value of the “country” field is “USA”, and “Other” if it is not.

code_block[StructValue([(u’code’, u’– Updating/Adding a field is easy to do as well. The structure will be automatically created (see “Charlie” row)rnrnQuery:rnrnUPDATE dataset_name.users_sample SET user = JSON_SET(user, “$.address.region_code”, IF(LAX_STRING(user.address.country) = “USA”, “America”, “Other”)) WHERE truernAfter the query above; if you run the query u201cSELECT * FROM dataset_name.users_sample ORDER BY STRING(user.name);u201d, you will receive the following Output:rnrn+——————————————————————————————-+rn| user |rn+——————————————————————————————-+rn| {“address”:{“city”:”SF”,”country”:”USA”,”region_code”:”America”},”age”:28,”name”:”Alice”} |rn| {“address”:{“country”:”Germany”,”region_code”:”Other”},”age”:”40″,”name”:”Bob”} |rn| {“address”:{“region_code”:”Other”},”name”:”Charlie”} |rn+——————————————————————————————-+’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3ea8d7b6da10>)])]

Last but not least, let’s say you have a table of property/value pairs you want to convert to a JSON object. With the new JSON_OBJECT constructor function, you can effortlessly create the new JSON object.

code_block[StructValue([(u’code’, u’Query:rnrnWITH Fruits AS (rnSELECT 0 AS id, ‘color’ AS k, ‘Red’ AS v UNION ALLrnSELECT 0, ‘fruit’, ‘apple’ UNION ALLrnSELECT 1, ‘fruit’,’banana’ UNION ALLrnSELECT 1, ‘ripe’, ‘true’rn)rnSELECT JSON_OBJECT(ARRAY_AGG(k), ARRAY_AGG(v)) AS json_datarnFROM FruitsrnGROUP BY idrnOutput:rn+———————————-+rn| json_data |rn+———————————-+rn| {“color”:”Red”,”fruit”:”apple”} |rn| {“fruit”:”banana”,”ripe”:”true”} |rn+———————————-+’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3ea8d7b6dad0>)])]

Complete list of functions

Lax conversion functions:

JSON constructor functions:

JSON_ARRAY

JSON_OBJECT

JSON mutator functions:

Try it out!

Google BigQuery is constantly adding new features to make it easier and more powerful to analyze your data. We encourage you to check them out and provide your feedback to us as we continue to develop additional features and capabilities to make working JSON easier and faster over time.

Cloud BlogRead More

Previous articleHow Meta is improving password security and preserving privacy

Next articleHow to Run Windscribe VPN in Windows with Python

Introducing new SQL functions to manipulate your JSON data in BigQuery

Complete list of functions

Try it out!

The overwhelmed person’s guide to Google Cloud: week of April 18

Introducing new ML model monitoring capabilities in BigQuery

The power of choice: Simplifying your regulatory and compliance journey

LEAVE A REPLY Cancel reply

Most Popular

The overwhelmed person’s guide to Google Cloud: week of April 18

Databricks DBRX is now available in Amazon SageMaker JumpStart

Introducing new ML model monitoring capabilities in BigQuery

Knowledge Bases in Amazon Bedrock now simplifies asking questions on a single document

Recent Comments

EDITOR PICKS

Exploring the Click Element Variable in Google Tag Manager

How to track events with Google Tag Manager and Google Analytics

Data Layer Variable in GTM: What, Why, and Where?

POPULAR POSTS

Introducing Amazon RDS Extended Support for MySQL databases on Amazon Aurora and Amazon RDS

Knowledge Bases for Amazon Bedrock now supports metadata filtering to improve retrieval accuracy

Private, secure, and seamless connectivity to Cloud SQL using Private Service Connect

POPULAR CATEGORY