Research shows that over 90% of large organizations already deploy multicloud architectures, and their data is distributed across several public cloud providers. Additionally, data is also increasingly split across various storage systems such as warehouses, operational and relational databases, object stores, etc. With the proliferation of new applications, data is serving many more use cases such as data sciences, business intelligence, analytics, streaming and the list goes on. With these data trends, customers are increasingly gravitating towards an open multicloud data lake. However, multicloud data lakes present several challenges such as data silos, data duplication, fragmented governance, complexity of tools, and increased costs.
With Google’s data cloud technologies, customers can leverage the unique combination of distributed cloud services. They can create an agile cross-cloud semantic business layer with Looker and manage data lakes and data warehouses across cloud environments at scale with BigQuery and capabilities like BigLake and BigQuery Omni.
BigLake is a storage engine that unifies data warehouses and lake houses by standardizing across different storage formats including BigQuery managed table and open file formats such as Parquet and Apache Iceberg on object storage. BigQuery Omni provides the compute engine that runs locally to the storage on AWS or Azure, which customers can use to query data in AWS or Azure seamlessly. This provides several key benefits such as:
A single pane of glass to query your multicloud data lakes (across Google Cloud Platform, Amazon Web Services, and Microsoft Azure)
Cross-cloud analytics by combining data across different platforms with little to no egress costs
Unified governance and secure management of your data wherever it resides
In this blog, we will share cross-cloud analytics use cases customers are solving with Google’s Data Cloud and the benefits they are realizing.
Unified marketing analytics for 360-degree insights
Organizations want to perform marketing analytics – ads optimization, inventory management, churn prediction, buyer propensity trends and many more such analytics. To do this before BigQuery Omni, customers had to use data from several different sources such as Google Analytics, public datasets and other proprietary information stored across cloud environments. This requires moving large amounts of data, managing duplicate copies and incremental costs to perform any cross-cloud analytics and derive actionable insights. With BigQuery Omni, organizations are able to greatly simplify this workflow. Using the familiar BigQuery interface, users can access data residing in AWS or Azure, discover and select just the relevant data that needs to be combined for further analysis. This subset of data can be moved to Google Cloud using Omni’s new Cross-Cloud Transfer capabilities. Customers can combine this data with other Google Cloud datasets and these consolidated tables can be made available to key business stakeholders through advanced analytics tools such as Looker and Looker Studio. Customers are also able to tie in this data now with world class AI models via Vertex AI.
As an illustrative example, consider a retailer who has sales & inventory, user and search data spread across multiple data silos. Using BigQuery Omni they can seamlessly bring these datasets together and power several marketing analytics scenarios like customer segmentation, campaign management and demand forecasting etc.
“Interested in performing cross-cloud analytics, we tested BigQuery Omni and really liked the SQL support to easily get data from AWS S3. We have seen great potential and value in BigQuery Omni for adopting a multi-cloud data strategy.” — Florian Valeye, Staff Data Engineer,Back Market, a leading online marketplace for renewed technology based out of France
Data platform with consistent and unified cross-cloud governance
Another pattern is customers looking to analyze operational, transactional and business data across data silos in different clouds through a unified data platform. These data silos are a result of various factors such as merger and acquisitions, standardization of analytical tools, leveraging best of breed solutions in different clouds and diversification of data footprint across clouds. In addition to a single pane of glass for data access across silos, customers deeply desire consistent and uniform governance of their data across clouds.
With BigLake and BigQuery Omni abstracting the storage and compute layers respectively, organizations can access and query their data in Google Cloud irrespective of where it resides. They can also set fine-grained row level and column access policies in BigQuery and consistently govern it across clouds. These building blocks enable data engineering teams to build a unified and governed data platform for their data users without having to deal with the complexity of building and managing complex data pipelines. Furthermore, with BigQuery Omni’s integration with Dataplex and Data Catalog, you can discover, search your data across clouds and enrich your data by adding relevant business context with business glossary and rich text.
“Several SADA customers use GCP to build and manage their data analytics platform. During many explorations and proofs of concepts, our customers have seen the great potential and value in BigQuery Omni. Enabling seamless cross-cloud data analytics has allowed them to realize the value of their data quicker while lowering the barrier to entry for BigQuery adoption in a low-risk fashion.” — Brian Suk, Associate Chief Technology Officer,SADA, one of the strategic partners of Google Cloud.
Simplified data sharing between data providers and their customers
A third emerging pattern in cross cloud analytics is data sharing. Several services have the business need to share information such as inventory data, subscriber data to their customers or users who in turn analyze or aggregate the data with their proprietary data and oftentimes share the results back with the service provider. In several cases, the two parties are on different cloud environments, requiring them to move data back and forth.
Consider an example from a company Action IQ that operates in the customer data platform (CDP) space. CDPs were designed to help activate customer data, and a critical first step of that was unifying and managing that customer data. To enable this, many CDP vendors built their solution choosing one of the available cloud infrastructure technologies and copied data from the client’s systems. “Copying data from client applications and infrastructure has always been a requirement to deploy a CDP, but it doesn’t have to be anymore” — Justin DeBrabant, Senior Vice President of Product, ActionIQ.
While a small percentage of customers are fine with moving data across cloud environments, the majority are hesitant to onboard new services and would rather prefer providing governed access to their data sets.
“A new architectural pattern is emerging, allowing organizations to keep their data at one location and make it accessible, with the proper guardrails, to applications used by the rest of the organization’s stack”addsJustin at ActionIQ.
With BigQuery Omni, services in Google Cloud Platform can more easily access and share data with their customers and users in other cloud environments with limited data movement. One of UK’s largest statistics providers has explored Omni for their data sharing needs.
“We tested BigQuery Omni and really like the ability to get data from AWS directly into BQ. We’re excited about managing data sharing with different organizations without onboarding new clouds” – Simon Sandford-Taylor, Chief Information and Digital Officer, UK’s Office for National Statistics
With BigQuery Omni, customers are able to:
Access and query data across clouds through a single user interface
Reduce the need for data engineering before analyzing data
Lower operational overhead and risks by deploying an application that runs across multiple clouds which leverages the same, consistent security controls
Accelerate access to insights by significantly reducing the time for data processing and analysis
Create consistent and predictable budgeting across multiple cloud footprints
Enable long term agility and maximize the benefits every cloud investment
Over the last year, we’ve seen great momentum in customer adoption and added significant innovations to BigQuery Omni including improved performance and scalability for querying your data in AWS S3 or Azure Blob Storage, Iceberg support for Omni, Larger query result set size up to 10GB and Cross-cloud transfer that helps customers easily, securely, and cost effectively move just enough data across cloud environments for advanced analytics.
BigQuery Omni has launched several features to support unified governance of your data across multiple clouds – you can get fine-grained access to your multi-cloud data with row level and column level security. Building on this, we are excited to announce that BigQuery Omni now supports data masking. We’ve also made it easy for customers to try and see the benefits of BigQuery Omni through the limited time free trial available until March 30, 2023.
BigQuery Omni running on other public clouds outside of Google Cloud is available in AWS US East1 (N.Virginia) and Azure US East2 (US East) regions. We are also excited to share that we will be bringing BigQuery Omni to more regions in the future, starting with Asia Pacific (AWS Korea) coming soon.
Get started with a free trial to learn about Omni. Check out the documentation to learn more about BigQuery Omni. You can also leverage the self paced labs to learn how to set up BigQuery Omni easily.
Cloud BlogRead More