Enterprises are generating data at an exponential rate, spanning traditional structured transactional data, semi-structured like JSON and unstructured data like images and audio. Beyond the scale of the data, these divergent types present processing challenges for developers, at times requiring separate processing flows for each. With its initial release BigQuery’s support for semi-structured JSON eliminated the need for such complex preprocessing and providing schema flexibility, intuitive querying and the scalability benefits afforded to structured data.
Today, we are excited to announce the release of new SQL functions for BigQuery JSON, extending the power and flexibility of our core JSON support. These functions make it even easier to extract and construct JSON data and perform complex data analysis.
With these new query functions, you can:
Convert JSON values into primitive types (INT64, FLOAT64, BOOL and STRING) in an easier and more flexible way with the new JSON LAX functions
Easily update and modify an existing JSON value in BigQuery with the new JSON mutator functions.
Construct JSON object and JSON array with SQL in BigQuery with the new JSON constructor functions.
Let’s review these new features and some examples of how to use them. First, we will create a table for demonstration.
code_block[StructValue([(u’code’, u’CREATE TABLE dataset_name.users_sample AS (rn SELECT JSON ‘{“name”: “Alice”, “age”: 28, “address”: {“country”: “USA”, “city”: “SF”, “zipcode”: 94102}}’ AS user UNION ALLrn SELECT JSON ‘{“name”: “Bob”, “age”: “40”, “address”: {“country”: “Germany”}}’ UNION ALLrn SELECT JSON ‘{“name”: “Charlie”, “age”: null, “address”: {“zipcode”: 12356, “country”: null}}’rn)’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3ea8d7b55a10>)])]
code_block[StructValue([(u’code’, u’Query:rnrn– Table contentsrnSELECT * FROM dataset_name.users_sample ORDER BY STRING(user.name);rnOutput:rn+———————————————————————————–+rn| user |rn+———————————————————————————–+rn| {“address”:{“city”:”SF”,”country”:”USA”,”zipcode”:94102},”age”:28,”name”:”Alice”} |rn| {“address”:{“country”:”Germany”},”age”:”40″,”name”:”Bob”} |rn| {“address”:{“country”:null,”zipcode”:12356},”age”:null,”name”:”Charlie”} |rn+———————————————————————————–+’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3ea8d7b55b90>)])]
Great! Let’s say we want to get a list of all users over 30. Looking at the table, you will see that user.age contains a JSON number in the first record, a JSON string in the second, and a JSON null in the third. With the new powerful LAX function, LAX_INT64, all types are automatically inferred and processed correctly.
code_block[StructValue([(u’code’, u’Query:rnrnrnSELECT user.name FROM dataset_name.users_samplernWHERE LAX_INT64(user.age) > 30rnOutput:rn+——-+rn| name |rn+——-+rn| “Bob” |rn+——-+’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3ea8d7b55dd0>)])]
Unlike the “strict” conversion functions, which require that the JSON type matches the primitive type exactly, the “lax” conversion functions will also handle conversions between mismatched data types.
For example, the strict conversion function below would return an error:
code_block[StructValue([(u’code’, u’Query:rnrnSELECT INT64(JSON ‘”10″‘) AS strict_int64rnOutput:rnError: The provided JSON input is not an integer’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3ea8d7c65ad0>)])]
However, the lax conversion function below would return the desired result:
code_block[StructValue([(u’code’, u’Query:rnrnSELECT LAX_INT64(JSON ‘”10″‘) AS lax_int64rnOutput:rn+———–+rn| lax_int64 |rn+———–+rn| 10 |rn+———–+’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3ea8d7c659d0>)])]
Furthermore, you can quickly and easily remove a field in the JSON data by using JSON_REMOVE functions.
code_block[StructValue([(u’code’, u’Query:rnrnrnUPDATE dataset_name.users_sample SET user = JSON_REMOVE(user, “$.address.zipcode”)rnWHERE truernAfter the query above; if you run the query u201cSELECT * FROM dataset_name.users_sample ORDER BY STRING(user.name);u201d, you will receive the following Output:rnrnrn+——————————————————————-+rn| user |rn+——————————————————————-+rn| {“address”:{“city”:”SF”,”country”:”USA”},”age”:28,”name”:”Alice”} |rn| {“address”:{“country”:”Germany”},”age”:”40″,”name”:”Bob”} |rn| {“address”:{“country”:null},”age”:null,”name”:”Charlie”} |rn+——————————————————————-+’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3ea8d7c65710>)])]
JSON_STRIP_NULLS compresses the data by removing JSON nulls. Although BigQuery null values neither impact performance nor storage cost, it can be helpful for reducing data size during exports.
code_block[StructValue([(u’code’, u’Query:rnrnrnUPDATE dataset_name.users_sample SET user = JSON_STRIP_NULLS(user, remove_empty=>true) WHERE truernAfter the query above; if you run the query u201cSELECT * FROM dataset_name.users_sample ORDER BY STRING(user.name);u201d, you will receive the following Output:rnrnrn+——————————————————————-+rn| user |rn+——————————————————————-+rn| {“address”:{“city”:”SF”,”country”:”USA”},”age”:28,”name”:”Alice”} |rn| {“address”:{“country”:”Germany”},”age”:”40″,”name”:”Bob”} |rn| {“name”:”Charlie”} |rn+——————————————————————-+’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3ea8d7b65b90>)])]
Now, what if we want to modify or add a field to the JSON data?
You can now update the data with the new JSON_SET function. And you have the ability to mix and match JSON functions together to achieve desired results.
For example, the query below adds a new field, “region_code”, to the table. The value of the field will be “America” if the value of the “country” field is “USA”, and “Other” if it is not.
code_block[StructValue([(u’code’, u’– Updating/Adding a field is easy to do as well. The structure will be automatically created (see “Charlie” row)rnrnQuery:rnrnUPDATE dataset_name.users_sample SET user = JSON_SET(user, “$.address.region_code”, IF(LAX_STRING(user.address.country) = “USA”, “America”, “Other”)) WHERE truernAfter the query above; if you run the query u201cSELECT * FROM dataset_name.users_sample ORDER BY STRING(user.name);u201d, you will receive the following Output:rnrn+——————————————————————————————-+rn| user |rn+——————————————————————————————-+rn| {“address”:{“city”:”SF”,”country”:”USA”,”region_code”:”America”},”age”:28,”name”:”Alice”} |rn| {“address”:{“country”:”Germany”,”region_code”:”Other”},”age”:”40″,”name”:”Bob”} |rn| {“address”:{“region_code”:”Other”},”name”:”Charlie”} |rn+——————————————————————————————-+’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3ea8d7b6da10>)])]
Last but not least, let’s say you have a table of property/value pairs you want to convert to a JSON object. With the new JSON_OBJECT constructor function, you can effortlessly create the new JSON object.
code_block[StructValue([(u’code’, u’Query:rnrnWITH Fruits AS (rnSELECT 0 AS id, ‘color’ AS k, ‘Red’ AS v UNION ALLrnSELECT 0, ‘fruit’, ‘apple’ UNION ALLrnSELECT 1, ‘fruit’,’banana’ UNION ALLrnSELECT 1, ‘ripe’, ‘true’rn)rnSELECT JSON_OBJECT(ARRAY_AGG(k), ARRAY_AGG(v)) AS json_datarnFROM FruitsrnGROUP BY idrnOutput:rn+———————————-+rn| json_data |rn+———————————-+rn| {“color”:”Red”,”fruit”:”apple”} |rn| {“fruit”:”banana”,”ripe”:”true”} |rn+———————————-+’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3ea8d7b6dad0>)])]
Cloud Blog