Every company these days is becoming a data company whether they know it or not. This results in preparing the company to create an ecosystem for data processing. Traditionally, organisations’ data ecosystems consisted of point solutions that used to provide data services. For example, one of the most common questions we get from customers is, “Do I need a data lake, or should I consider a data warehouse? Do you recommend I consider both?” Traditionally, these two architectures have been viewed as separate systems, applicable to specific data types and user skill sets. Increasingly, we are seeing a blurring of lines between data warehouses and data lakes, which provide customers with an opportunity to create a more comprehensive platform that gives them the best of both worlds.
What if we don’t need to compromise? And we create an end-to-end solution covering the entire data management and processing stages, from data collection to data analysis and machine learning. Thus, data platforms are typically used to store vast amounts of data in varying formats and doing so without compromising on latency. At the same time, providing a platform for all users throughout the data lifecycle.
There are data solutions and architectures we’re seeing and anticipating. Emerging concepts include data lakehouses, data meshes, and data vaults. Some are not new and have been around in different shapes and formats, however, all of them work naturally within a Google Cloud environment. Lets look into both ends of the spectrum of enabling data and enabling teams.
Data mesh facilitates a decentralized approach to data ownership, allowing individual lines of business to publish and subscribe to data in a standardized manner, instead of forcing data access and stewardship through a single, centralized team. On the other hand, a data lake house brings raw and processed data closer together, allowing for a more streamlined and centralized repository of data needed throughout the organization. Processing can be done in transit via ELT, reducing the need to copy datasets across systems. This allows for easier data exploration and easier governance. Data vault is designed to separate data driven and model driven parts, so the way data is integrated in the raw vault enables parallel loading so that large implementations can scale out easily.
In Google Cloud, there is no need to keep them separate. In fact, with interoperability among our portfolio of data analytics products, you can easily provide access to data residing in different places, effectively bringing your data lake and data warehouse together on a single platform.
Let’s look at some of the technological innovations that make this reality. BigQuery’s storage API allows treating a data warehouse like a data lake, letting you access the data residing in BigQuery. For example, you can use Spark to access data residing in the data warehouse without it affecting performance of any other jobs accessing it. This is all made possible by the underlying architecture, which separates compute and storage.
We continue to offer specialized products and solutions around data lake and data warehouse functionality but over time we expect to see a significant enough convergence of the two systems that the terminology will change. At Google Cloud, we consider this combination an “analytics data platform”.
Tactical or Strategical
Key differentiators of Google Cloud’s data analytics platform are being open, intelligent, flexible, and tightly integrated. There are many technologies in the market which provide tactical solutions that may feel comfortable and familiar. However, this can be a rather short-term approach that simply lifts and shifts a siloed solution into the cloud. In contrast, an analytics data platform built on Google Cloud provides modern data warehousing and data lake capabilities with close integration to our AI Platform. It also provides built-in streaming, ML, and geospatial capabilities and an in-memory solution for BI use cases. Depending on your organizational data needs, Google Cloud has the set of products, tools, and services to create the right data platform for you.
To become a truly data-driven organization, the first step is to design and implement an analytics data platform that meets your technical and business needs. Whether you want to empower teams to own, publish, and share their data across the organization, or you want to create a streamlined store of raw and processed data for easier discovery, there is a solution that best meets the needs of your company.
To learn more about the elements of a unified analytics data platform built on Google Cloud, and the differences in platform architectures and organizational structures, read our Unified Analytics Platform paper.
Cloud BlogRead More