Amazon SageMaker Studio is the first fully integrated development environment (IDE) for machine learning (ML). Studio provides a single web-based visual interface where you can perform all ML development steps required to prepare data, as well as build, train, and deploy models. Lifecycle configurations are shell scripts triggered by Studio lifecycle events, such as starting a new Studio notebook. You can use lifecycle configurations to automate customization for your Studio environment. This customization includes installing custom packages, configuring notebook extensions, preloading datasets, and setting up source code repositories. For example, as an administrator for a Studio domain, you may want to save costs by having notebook apps shut down automatically after long periods of inactivity.
The AWS Cloud Development Kit (AWS CDK) is a framework for defining cloud infrastructure through code and provisioning it through AWS CloudFormation stacks. A stack is a collection of AWS resources that can be programmatically updated, moved, or deleted. AWS CDK constructs are the building blocks of AWS CDK applications, representing the blueprint to define cloud architectures.
In this post, we show how to use the AWS CDK to set up Studio, use Studio lifecycle configurations, and enable its access for data scientists and developers in your organization.
The modularity of lifecycle configurations allows you to apply them to all users in a domain or to specific users. This way, you can set up lifecycle configurations and reference them in the Studio kernel gateway or Jupyter server quickly and consistently. The kernel gateway is the entry point to interact with a notebook instance, whereas the Jupyter server represents the Studio instance. This enables you to apply DevOps best practices and meet safety, compliance, and configuration standards across all AWS accounts and Regions. For this post, we use Python as the main language, but the code can be easily changed to other AWS CDK supported languages. For more information, refer to Working with the AWS CDK.
To get started, make sure you have the following prerequisites:
The AWS Command Line Interface (AWS CLI) installed.
The AWS CDK installed. For more information, refer to Getting started with the AWS CDK and Working with the AWS CDK in Python.
An AWS profile with permissions to create AWS Identity and Access Management (IAM) roles, Studio domains, and Studio user profiles.
Clone the GitHub repository
As you clone the repository, you can observe that we have a classic AWS CDK project with the directory studio-lifecycle-config-construct, which contains the construct and resources required to create lifecycle configurations.
AWS CDK constructs
The file we want to inspect is aws_sagemaker_lifecycle.py. This file contains the SageMakerStudioLifeCycleConfig construct we use to set up and create lifecycle configurations.
The SageMakerStudioLifeCycleConfig construct provides the framework for building lifecycle configurations using a custom AWS Lambda function and shell code read in from a file. The construct contains the following parameters:
ID – The name of the current project.
studio_lifecycle_content – The base64 encoded content.
studio_lifecycle_tags – Labels you assign to organize Amazon resources. They are inputted as key-value pairs and are optional for this configuration.
studio_lifecycle_config_app_type – JupyterServer is for the unique server itself, and the KernelGateway app corresponds to a running SageMaker image container.
For more information on the Studio notebook architecture, refer to Dive deep into Amazon SageMaker Studio Notebooks architecture.
The following is a code snippet of the Studio lifecycle config construct (aws_sagemaker_lifecycle.py):
After you import and install the construct, you can use it. The following code snippet shows how to create a lifecycle config using the construct in a stack either in app.py or another construct:
Deploy AWS CDK constructs
To deploy your AWS CDK stack, run the following commands in the location where you cloned the repository.
The command may be python instead of python3 depending on your path configurations.
Create a virtual environment:
For macOS/Linux, use python3 -m venv .cdk-venv.
For Windows, use python3 -m venv .cdk-venv.
Activate the virtual environment:
For macOS/Linux, use source .cdk-venvbinactivate.
For Windows, use .cdk-venv/Scripts/activate.bat.
For PowerShell, use .cdk-venv/Scripts/activate.ps1.
Install the required dependencies:
pip install -r requirements.txt
pip install -r requirements-dev.txt
At this point, you can optionally synthesize the CloudFormation template for this code:
Deploy the solution with the following commands:
When the stack is successfully deployed, you should be able to view the stack on the CloudFormation console.
You will also be able to view the lifecycle configuration on the SageMaker console.
Choose the lifecycle configuration to view the shell code that runs as well as any tags you assigned.
Attach the Studio lifecycle configuration
There are multiple ways to attach a lifecycle configuration. In this section, we present two methods: using the AWS Management Console, and programmatically using the infrastructure provided.
Attach the lifecycle configuration using the console
To use the console, complete the following steps:
On the SageMaker console, choose Domains in the navigation pane.
Choose the domain name you’re using and the current user profile, then choose Edit.
Select the lifecycle configuration you want to use and choose Attach.
From here, you can also set it as default.
Attach the lifecycle configuration programmatically
You can also retrieve the ARN of the Studio lifecycle configuration created by the construct’s and attach it to the Studio construct programmatically. The following code shows the lifecycle configuration ARN being passed to a Studio construct:
Complete the steps in this section to clean up your resources.
Delete the Studio lifecycle configuration
To delete your lifecycle configuration, complete the following steps:
On the SageMaker console, choose Studio lifecycle configurations in the navigation pane.
Select the lifecycle configuration, then choose Delete.
Delete the AWS CDK stack
When you’re done with the resources you created, you can destroy your AWS CDK stack by running the following command in the location where you cloned the repository:
When asked to confirm the deletion of the stack, enter yes.
You can also delete the stack on the AWS CloudFormation console with the following steps:
On the AWS CloudFormation console, choose Stacks in the navigation pane.
Choose the stack that you want to delete.
In the stack details pane, choose Delete.
Choose Delete stack when prompted.
If you run into any errors, you may have to manually delete some resources depending on your account configuration.
In this post, we discussed how Studio serves as an IDE for ML workloads. Studio offers lifecycle configuration support, which allows you to set up custom shell scripts to perform automated tasks, or set up development environments at launch. We used AWS CDK constructs to build the infrastructure for the custom resource and lifecycle configuration. Constructs are synthesized into CloudFormation stacks that are then deployed to create the custom resource and lifecycle script that is used in Studio and the notebook kernel.
For more information, visit Amazon SageMaker Studio.
About the Authors
Cory Hairston is a Software Engineer with the Amazon ML Solutions Lab. He currently works on providing reusable software solutions.
Alex Chirayath is a Senior Machine Learning Engineer at the Amazon ML Solutions Lab. He leads teams of data scientists and engineers to build AI applications to address business needs.
Gouri Pandeshwar is an Engineer Manager at the Amazon ML Solutions Lab. He and his team of engineers are working to build reusable solutions and frameworks that help accelerate adoption of AWS AI/ML services for customers’ business use cases.
Read MoreAWS Machine Learning Blog