Thursday, May 2, 2024
No menu items!
HomeCloud ComputingA guide on defining tenancy strategy for Composer environment

A guide on defining tenancy strategy for Composer environment

Getting Started

Customers can have numerous data analytics teams within a single organization that each require workflow or data pipeline orchestration. It is important to evaluate the tenancy design of your implementation to improve the efficiency, scalability, and security of your organization.

Google Cloud offers Cloud Composer, a fully managed workflow orchestration service built on Apache Airflow offering end-to-end integration with Google Cloud products including BigQuery, Dataflow, Dataproc, Datastore, Cloud Storage, Pub/Sub, and Vertex AI.

This guide compares the pros and cons of different tenancy strategies for Cloud Composer. We’ll evaluate the differences between a multi-tenant Composer strategy versus a single-tenant Composer strategy. In other words, a single shared Composer environment for all data teams vs. a Composer environment for each data team.

Note: this document uses the Composer Environment Recommended Presets(Large, Medium, Small) and the Composer 2 Pricing Modelwhen comparing the per-DAG costs of a single large Composer environment vs. many smaller environments.

The case for a multi-tenant Composer environment

Pros of multi-tenancy

+ Centralized governance

Configuration changes, permission issues, RBAC assignments for a single environment exist within one Google Cloud project. A central platform team can manage these requests across data teams.

+ Best per-DAG cost

It may be more cost effective on a per-dag basis to have a 1 Large environment preset instead of 2+ Medium environment presets or 3+ Small environment presets.

+ Unified CI/CD

A single CI/CD pipeline will perform your testing, validation, and DAG deployments across data teams.

Cons of multi-tenancy

– Insufficient isolation

A single default service account for all DAGs merges teams and their authorizations into one large single-authorization scope. There’s a product limitation where one team can write a DAG which touches another team’s data.

– Single point of failure

This architecture design increases the risk potential of errors and/or failures. It also limits reliability related to noisy neighbors (co-tenants that monopolize resources). A single point of failure across data lakes is concerning. Mitigate potential risks via:

The combined DAG count / Max Concurrent DAG Runs / Max Concurrent Tasks are within recommended limitsFollowing HA best practices (two schedulers, autoscaling)Setting up a robust snapshot disaster recovery strategyUsers recognizing SLAs for configuration/permission changesMaintenance windows are set and acknowledgedThorough testing/validation occurs in lower environments before deploying DAGs to production

– Limited scalability

Google Cloud best practices suggest mapping the Large Composer environment preset to 1000 total DAGs, 250 Max Concurrent DAG Runs, and 400 Max Concurrent Tasks. With many data lakes using a single shared environment, the likelihood of hitting these soft limits is much greater.

However, there are a variety of ways to maximize environment scalability:

Use Large environment preset w/ largest machine types for each Airflow componentUtilize the Airflow Database Cleanup DAG for periodic database maintenanceOptimize Cloud Composer via better Airflow DAGsMaximize the benefits of Cloud Composer and reduce parse times

– Impact on Snapshots and Upgrades

A single Composer environment will experience more issues as the Airflow metadata database grows. The size of the database has an impact on upgrades — you will be unable to perform environment upgrades beyond a 16GB database size and unable to perform snapshots beyond 20GB database size.

Utilize the Airflow Database Cleanup DAG for periodic database maintenance.

– Eventual need for logical groupings

When a single shared environment hits these soft limits listed above, you’ll need to introduce additional Composer environments to scale. One option is to create shared-environment-1, shared-environment-2, shared-environment-3. But this is arbitrary and doesn’t portray logical dependencies between environments. Logical groupings will help development across DAGs, and this is the beginning of a multi-tenant approach.

– Opportunity cost

When certain failures occur, your entire Composer environment / data orchestration solution may no longer operate. The opportunity cost of this situation far outweighs the cost benefits of using a Large vs. a Medium or Small Composer environment preset.

– Slower development

Any configuration or permission requests require approval through the central team. You’ll need to create response-time SLAs for each data lake to ensure minimal development blockers or disruptions.

The case for dedicated single-tenant Composer environments

Pros of single-tenancy

+ Isolated points of failure

A config or performance failure in one data lake won’t directly affect the DAGs of another data lake.

+ Isolated security

A developer cannot access another team’s DAGs/data without explicitly collaborating and building cross-environment dependencies.

+ Improved developer productivity

Quality-of-life via faster deployments, configuration changes, permission assignments. (Assuming requests no longer need to go through central team and responsibility now lies with data-lake owners)

+ Logical relationships between data lakes

Handling cross-DAG dependencies between data lakes will be logically straightforward. If you know a given process in Project 1 requires Project 2, you can work directly with the Project 2 Composer environment.

+ Improved performance of your organization-specific Airflow DAGs

Customized airflow configurations tailored to your organization workloads vs. a general shared environment with generic configurations.

+ Less need for scaling measures

A single data team may not exceed the 1000 DAG limit in the near future, therefore scaling the environment won’t be necessary. Single one-time deployment of a Large or Medium environment preset is sufficient. There is a smaller impact on Airflow Metadata Database size and thus uninhibited snapshot/upgrade implementation.

+ Organized and simplified logging/monitoring

A single data team will not need to filter Cloud Logging or Airflow UI every time their developers need to run diagnostics.

+ Simplified cost-tracking

Composer-related costs for the data team confine to a single environment.

+ Scoped impact of maintenance operations

Different teams can be on their own maintenance, snapshot, and environment upgrade schedule to fit their development needs.

Cons of single-tenancy

– Cross data-lake dependencies

If there are tightly-coupled multi data lake processes, there will be additional development and infrastructure required to maintain DAG to DAG dependencies. You can accomplish this via Pub/Sub and Airflow sensors. Alternatively, use Cloud Run to intercept Pub/Sub messages, read content, and decide which Composer environment and DAG to trigger.

– Per-DAG cost

If certain data lakes only require a Medium or Small environment preset to operate, then you’ll be paying more per DAG (but still less overall) than you would with a shared Large environment preset.

– Governing multiple Composer environments

For example, with 10,000 DAGs across several data lakes, the shared-environment approach would require 10 Large Composer environments. The dedicated-environment approach would require governance of several data lakes of differing requirements. Check out Cross-project environment monitoring to simplify these tasks.

Conclusion

A multi-tenant Composer environment is a good option for smaller organizations just getting started with Composer, or organizations with a small number of data lakes and simple workflows. This approach can be more cost-effective than creating and managing multiple Composer environments, and it can also centralize governance and administration.

For larger, mature, or rapidly-growing customers, dedicated single-tenant Composer environments for each team is a better option. This strategy prioritizes security, scalability, availability, and developer productivity.

If you’d like to stay up-to-date with the latest Airflow multi-tenancy discussions, check out 2023 Airflow Summit’s Multi-tenancy State of the Union.

Cloud BlogRead More

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Most Popular

Recent Comments