Tuesday, April 16, 2024
No menu items!
HomeCloud ComputingAutomate data governance, extend your data fabric with Dataplex-BigLake integration

Automate data governance, extend your data fabric with Dataplex-BigLake integration

Unlocking the full potential of data requires breaking down the silo between open-source data formats and data warehouses. At the same time, it is critical to enable data governance team to apply policies regardless of where the data happens, whether – on file  or columnar storage. 

Today,  data governance teams have to become subject matter experts on each storage system the corporate data happens to reside on. Since February 2022,  Dataplex has offered a unified  place to apply policies, which are propagated across both lake storage and data warehouses in GCP. Rather than specifying policies in multiple places, bearing the cognitive load of translating policies from “what you want the storage system to do” to “how your data should behave” Dataplex offers a single point for unambiguous policy management.  Now, we are making it easier for you to use BigLake.  

Earlier this year, we launched BigLake into general availability, BigLake unifies data fabric between Data Lakes and Data Warehouses by extending BigQuery storage to open file formats. Today, we announce BigLake Integration with Dataplex (available in preview). This integration eliminates the configuration steps for the admin taking advantage of BigLake and managing policies across GCS and BigQuery from a unified console. 

Previously,  you could point Dataplex at a Google Cloud Storage (GCS) bucket, and Dataplex will discover and extract all metadata from the data lake and register this metadata in BigQuery (and Dataproc Metastore, Data Catalog) for analysis and search. With the BigLake integration capability, we are building on this capability by allowing an “upgrade” of a bucket asset, and instead of just creating external tables in BigQuery for analysis – Dataplex will create policy-capable BigLake tables! 

The immediate implication is that admins can now assign column, row, and table policies to the BigLake tables auto-created by Dataplex, as with BigLake – the infrastructure (GCS) layer is separate from the analysis layer (BigQuery). Dataplex will handle the creation of a BigQuery connection and a BigQuery publishing dataset and ensure the BigQuery service account has the correct permissions on the bucket.

But wait – there’s more.

With this release of Dataplex, we are also introducing advanced logging called governance logs.  Governance logs allow tracking the exact state of policy propagation to tables and columns – adding an additional level of detail going beyond the high-level “status” for the bucket and into fine-grained status and logs for tables, columns. 

What’s next? 

We have updated our documentation for managing buckets and have additional detail regarding policy propagation and the upgrade process.

Stay tuned for an exciting  roadmap ahead, with more automation around policy management.

For more information, please visit:

Google Cloud Dataplex

Cloud BlogRead More

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Most Popular

Recent Comments