Tuesday, September 17, 2024
No menu items!
HomeCloud ComputingBigQuery authorized views permissions via Terraform, avoiding the chicken & egg problem

BigQuery authorized views permissions via Terraform, avoiding the chicken & egg problem

Enterprises that use Terraform for spinning up their Infrastructure, including the instantiation of Google BigQuery, can run into a chicken & egg problem if using the IAM access permissions resource blocks for both their Datasets and Authorized Views. 

This problem can cause BigQuery operational issues across an organization, creating an unpleasant experience for the end-user due to the momentary loss of access to the data. End users without access to “private data” are likely to rely on the Authorized views to a great extent. 

This blog post shows how to avoid running into the problem and provides a step-by-step guide to correctly managing Authorized View permissions via Terraform. This publication has three components; Use case, problem statement, and solution.

1. Use case

The use case at hand involves 2 products, Google Cloud BigQuery and Hashicorp Terraform. Let’s look at both in light of the use case, one by one.

BigQuery is Google Cloud’s fully managed enterprise data warehouse that helps you manage and analyze your data with built-in features like machine learning, geospatial analysis, and business intelligence. To consume and take advantage of BigQuery, you need datasets. 

Datasets are logical containers (contained within a specific project) that are used to organize and control access to your BigQuery resources. Datasets are similar to schemas in other database systems. A table or view must belong to a dataset, so you need to create at least one dataset before loading data into BigQuery. 

Cloud IAM can restrict members’ access to table levels but not to “parts of a table.” Suppose you have a use case where you want a member with a data viewer role to query / access specific information in a table, like an employee’s name and job title by department, without having access to the address of every employee. In that case, you can create a BigQuery authorized view. An authorized view lets you share query results with particular users and groups without giving them access to the underlying source data.

The industry standard for infrastructure provisioning on Google Cloud is via Terraform tool by HashiCorp.Terraform is used to instantiate all infrastructure components and supports BigQuery resources. To manage IAM policies for BigQuery datasets, Terraform has three different resources: google_bigquery_dataset_iam_policy, google_bigquery_dataset_iam_binding, and google_bigquery_dataset_iam_member

2. Problem statement

These BigQuery resources are intended to convert the permissions system for BigQuery datasets to the standard IAM interface. Still, there is a warning note as part of the Terraform documentation: “Using any of these resources will remove any authorized view permissions from the dataset. To assign and preserve authorized view permissions, use the google_bigquery_dataset_access instead.”

As the note said, these resources work well in some scenarios but not for “Authorized Views” permissions. The Google Terraform resources to manage IAM policy for a BigQuery dataset each have respective unique use cases:

google_bigquery_dataset_iam_policy: Authoritative. Sets the IAM policy for the dataset and replaces any existing policy already attached.

google_bigquery_dataset_iam_binding: Authoritative for a given role. Updates the IAM policy to grant a role to a list of members. Other roles within the IAM policy for the dataset are preserved.

google_bigquery_dataset_iam_member: Non-authoritative. Updates the IAM policy to grant a role to a new member. Other members for the role for the dataset are preserved.

Using any of these resources together with an authorized view will remove the permissions from the dataset. If any of these resources are used in conjunction with the “google_bigquery_dataset_access” resource or the “access” field on the “google_bigquery_dataset” resource, we will end up in a race condition where these resources will fight over which permissions take precedence. So, this essentially means that if we try to create and assign permissions to authorized views simultaneously as dataset creation from within the Terraform code, we will end up with a chicken & egg problem where there will be a dispute between the dataset and authorized views policy, causing the authorized views permissions to be wiped out as a result.

Lets see the issue re-creation in action below.

Terraform BigQuery  – dataset, table and authorized view resources
Terraform BigQuery  – table IAM policy resource

We can confirm the creation works with following query and Console screenshot:

From the Google Cloud console we can see the created dataset, the authorized view and the dummy SA

Google Cloud console  – Authorized view BQ dataset
Google Cloud console  – Authorized view permissions

Now we add a new user to the source dataset with the following code.

This revokes the authorized view and the “dummy terraform” SA loses its previously functional access.

Google Cloud console  – Authorized view BQ dataset

As we discussed previously, this will be the behavior due to how IAM is implemented on BQ datasets; we need to consider all constraints around the IAM policy for BigQuery dataset and design our Terraform with the google_bigquery resource that best fits our needs. For our scenario, the resource that helped us resolve this issue is google_bigquery_dataset_access; this resource gives dataset access for a single entity and is intended to be used in cases where it is not possible to compile a complete list of access blocks to include in a google_bigquery_dataset resource and is the recommended resource when creating authorized views.

Referring to the HCL code below, we have created a module for the dataset access resource; due to the nature of google_bigquery_dataset_access of giving access to a single entity. We are looping through a list of datasets and passing the dataset details to the module; this helped us avoid removing any authorized views from that dataset.

Terraform – module/dataset_access/main.tf
Terraform – module/dataset_access/output.tf
Terraform – module/dataset_access/variables.tf
Terraform – example/main.tf
Terraform – example/terraform.tfvars

In conclusion, how BigQuery implements IAM via Terraform is unique and different from how we do IAM for other Google Cloud services. It is essential first to understand the architecture of a specific BigQuery implementation and then feed that into deciding which BQ TF IAM resource(s) to use. 

We encourage you to read more about creating Authorized views and take a look at all the available Terraform blueprints for Google Cloud at the following links. 

Create an authorized viewTerraform blueprints and modules for Google Cloud

Cloud BlogRead More

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Most Popular

Recent Comments