IDC estimates that by 2025, there will be 175 zettabytes of data in the world, and 80% of that data will be unstructured. However, 90% of unstructured data is never analyzed. That’s because it can be cumbersome, expensive and risky to extract and transform unstructured data, requiring multiple tools. As such, it’s rarely used in organizations’ data pipelines.
Google Cloud’s recent innovations in generative AI, including foundation models for text and vision, open up various avenues for data teams to harness this untapped unstructured data. Object tables, a new table type in BigQuery, provides a structured record interface for unstructured data stored in Cloud Storage, unlocking additional possibilities.
Today, we are taking it one step further with the integration of BigQuery and Vertex AI foundation models, making it simple and easy for you to analyze unstructured data from right inside BigQuery. With the integration of BigQuery and Vertex AI foundation models, we are bringing generative AI directly to where your data resides. This approach has numerous benefits:
Eliminates the need to build and manage data pipelines between BigQuery and generative AI model APIs
Streamlines governance and helps reduce the risk of data loss by avoiding data movement
Reduces the need to write and manage custom Python code to call AI models
Enables you to analyze data at petabyte-scale without compromising on performance
Can lower your total cost of ownership with a simplified architecture
All this is made possible with BigQuery ML inference engine, which offers machine learning capabilities right inside BigQuery, and which recently became generally available. For each of the last two years, BigQuery ML has seen over 250% YoY query growth. This year, customers have run over 300 million prediction and training queries in BigQuery ML.
Starting with the first supported foundation model, text analysis via PaLM 2 (text-bison), you can now write just a few lines of SQL in BigQuery ML to analyze unstructured data for advanced text processing tasks such as summarization or sentiment analysis, retrieve results in a structured format, and use it with other data for further analysis.
How does it work?
Under the hood, BigQuery ML’s inference engine uses ML.GENERATE_TEXT function to call Vertex AI text-bison models from the Model Garden. Here are two simple steps to use this feature:
1. Register the model as a remote model
2. Run inference. Here’s an example where users can do data enrichment by obtaining the country name for a given city name. Note that “city” is a column in the “example_table”.
How customers are leveraging PaLM in BigQuery
Early users of BigQuery and Vertex AI foundation model integration have expressed tremendous interest in solving various use cases across industries. For instance, using ML.GENERATE_TEXT can simplify advanced data processing tasks:
Content generation: Analyze customer feedback and generate personalized email content right inside BigQuery without the need for complex tools
Summarization: Summarize text stored in BigQuery columns such as online reviews or chat transcripts
Data enhancement: Obtain a country name for a given city name
Rephrasing: Correct spelling and grammar in textual content such as voice-to-text transcriptions
Feature extraction: Extract key information or words from the large text files such as in online reviews and call transcripts
Sentiment analysis: Understand human sentiment about specific subjects in a text
Faraday, a leading customer prediction platform, previously had to build data pipelines and join multiple datasets. Now, not only can they simplify sentiment analysis, but they can also take customer sentiment, join it with additional customer first-party data, and feed it back into the LLMs to generate hyper personalized content — all within BigQuery. Watch this demo video to learn more.
“Faraday’s clients already get the benefit of predictions made from structured data. Now that Google has integrated BigQuery and Vertex AI foundation models, we can scalably predict business outcomes using unstructured data too..” – Seamus Abshere, CTO, Faraday.
Getting started
To learn more, visit the documentation page, or try out this tutorial to extract keywords from text.
Cloud BlogRead More