Saturday, September 7, 2024
No menu items!
HomeCloud ComputingBuilding reusable Machine Learning workflows with Pipeline Templates

Building reusable Machine Learning workflows with Pipeline Templates

One of the best ways to share, reuse, and scale your ML workflows is to run them as pipelines. To maximize their value, it’s important to build these pipelines in such a way that you can easily reproduce runs that produce similar results, as described in the paper “Hidden Technical Debt in Machine Learning Systems”. 

We are excited to announce support for pipeline templates on Vertex AI Pipelines. This blog post demonstrates how to create, upload, and (re)use end-to-end pipeline templates using the Kubeflow Pipelines (KFP) SDK registry client (RegistryClient), Artifact Registry, and Vertex AI Pipelines

Understanding Pipeline Templates 

A pipeline template is a resource that you can use to publish a workflow definition so that it can be reused. The KFP RegistryClient is a new client interface that you can use with a compatible registry server, such as Artifact Registry, for version control of your Kubeflow Pipelines templates. Using Artifact Registry makes it easier for your organization to:

Publish workflow definitions, which can be executed by other users

Store, manage, and control workflow definitions 

Discover workflows 

Building a Reusable Pipeline Template

You can build, upload, and use a template in just a few steps. Let’s take a look at building a simple, three-step pipeline. We will then register and upload the template to Artifact Registry, from where we can run it using Vertex AI Pipelines. The Notebook can be found on our Google Cloud Github repository. The end result will look like this:

Step 1:  Create an Artifact Repository

The first step is to create a repository in the Artifact Registry. This is where you store and track template artifacts. When creating the repository make sure that you set it to the Kubeflow Pipelines format. Once you create your repository you will see something like this.

Step 2: Build a Pipeline

Now it’s time to build and compile an end-to-end pipeline using the KFP DSL. The pipeline in this example has two steps (custom components):

Train a Tensorflow model.

Upload the trained model to the Vertex AI Model Registry

code_block[StructValue([(u’code’, u’@dsl.pipeline(rn name=’pipeline_template’,rn description=”pipeline that trains and uploads a TF Model”,rn)rndef ml_pipeline():rn rn train_task = model_train(bq_data=TRAINING_DATA, project_id=PROJECT_ID, region=REGION, model_dir=MODEL_DIR)rn rn uploader_task = model_upload(project_id=PROJECT_ID, region=REGION, model_dir=MODEL_DIR).after(train_task)’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e36d2eea4d0>)])]

(Of course, you can build your own, more advanced pipelines depending on your use case.) Once you have finalized your pipeline, use the compiler to generate the workflow yaml: template-pipeline.yaml. 

code_block[StructValue([(u’code’, u”compiler.Compiler().compile(rn pipeline_func=bqml_pipeline,rn package_path=’template-pipeline.yaml’)”), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e36d1097c90>)])]

Step 3: Upload the Template

After you have compiled the pipeline, use RegistryClient to configure a registry client.

code_block[StructValue([(u’code’, u’from kfp.registry import RegistryClientrnclient = RegistryClient(host=f”https://us-central1-kfp.pkg.dev/{PROJECT_ID}/quickstart-kfp-repo”)’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e36d1097290>)])]

Now you’re ready to upload the workflow YAML, in this case template-pipeline.yaml, to the repository that you created earlier. For this, you use client.upload_pipeline():

code_block[StructValue([(u’code’, u’templateName, versionName = client.upload_pipeline(rn file_name=”template-pipeline.yaml”,rn tags=[“v1”, “latest”],rn extra_headers={“description”:”This is an example pipeline template.”})’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e36d2eea610>)])]

To find the template that you uploaded, navigate to the Pipelines tab in Vertex AI Pipelines. You will see something like this:

You can use this central repository to store, manage, and track all of your pipeline templates in one place. 

Step 4: Reuse the Template in Vertex AI Pipelines Using the Vertex AI SDK

The main advantage of storing your pipeline template in a central repository is easy sharing and reuse. We can easily reuse the pipeline template in Vertex AI Pipelines. 

To run (reuse) your pipeline template in Vertex AI Pipelines, you first need to create a Cloud Storage bucket for staging pipeline runs. Then, use the Vertex AI SDK to create a pipeline run from your template in the artifact repository. 

code_block[StructValue([(u’code’, u’job = vertex.PipelineJob(rn display_name=”pipeline-template”,rn template_path=f”https://us-central1-kfp.pkg.dev/{PROJECT_ID}/quickstart-kfp-repo/pipeline-template/” + \rn versionName)rnrnjob.submit()’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e36d3167790>)])]

You can view the runs created by a specific pipeline version using the Vertex AI SDK for Python. To list the pipeline runs, run the PipelineJobs.list command.

code_block[StructValue([(u’code’, u’filter = f’template_uri:”https://us-central1-kfp.pkg.dev/{PROJECT_ID}/quickstart-kfp-repo/pipeline-template/*” AND ‘ + \rn ‘template_metadata.version=”%s”‘ % versionNamernrnvertex.PipelineJob.list(filter=filter)’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e36d2eeac50>)])]

You can also see the pipeline run by navigating to the Runs tab in the Vertex Pipelines UI. When you click on the pipeline job you can see its topology and progress. 

Optional: Reuse the Template in Vertex AI Pipelines Via the UI

If you prefer, you can also use the Vertex UI to reuse your pipeline templates. First, navigate to the Pipelines tab in the Vertex Pipelines UI. 

Next, select the pipeline that you want to run and click Create run. In the Create Pipeline Run window, set the details and runtime configuration, and then submit your pipeline run.

What’s Next

Now that you know how to build, upload, and reuse a pipeline template, you’re ready to start deploying some pipeline templates of your own! You can find more information in our docs. Or learn more about Vertex AI Pipelines in this Codelab

Thank you for reading! Have a question or want to chat? Find Erwin on Twitter or LinkedIn.

Related Article

New 20+ pipeline operators for BQML

We describe the new BigQuery and BigQuery ML (BQML) components now available for Vertex AI Pipelines, enabling data scientists and ML eng…

Read Article

Related Article

Announcing Vertex Pipelines general availability

Scalably run ML pipelines built with Kubeflow Pipelines or TFX without worrying about spinning up infrastructure.

Read Article

Cloud BlogRead More

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Most Popular

Recent Comments