Amazon SageMaker has announced the support of three new completion criteria for Amazon SageMaker automatic model tuning, providing you with an additional set of levers to control the stopping criteria of the tuning job when finding the best hyperparameter configuration for your model.
In this post, we discuss these new completion criteria, when to use them, and some of the benefits they bring.
SageMaker automatic model tuning
Automatic model tuning, also called hyperparameter tuning, finds the best version of a model as measured by the metric we choose. It spins up many training jobs on the dataset provided, using the algorithm chosen and hyperparameters ranges specified. Each training job can be completed early when the objective metric isn’t improving significantly, which is known as early stopping.
Until now, there were limited ways to control the overall tuning job, such as specifying the maximum number of training jobs. However, the selection of this parameter value is heuristic at best. A larger value increases tuning costs, and a smaller value may not yield the best version of the model at all times.
SageMaker automatic model tuning solves these challenges by giving you multiple completion criteria for the tuning job. It’s applied at the tuning level rather than at each individual training job level, which means it operates at a higher abstraction layer.
Benefits of tuning job completion criteria
With better control over when the tuning job will stop, you get the benefit of cost savings by not having the job run for extended periods and being computationally expensive. It also means you can ensure that the job doesn’t stop too early and you get a sufficiently good quality model that meets your objectives. You can choose to stop the tuning job when the models are no longer improving after a set of iterations or when the estimated residual improvement doesn’t justify the compute resources and time.
In addition to the existing maximum number of training job completion criteria MaxNumberOfTrainingJobs, automatic model tuning introduces the option to stop tuning based on a maximum tuning time, Improvement monitoring, and convergence detection.
Let’s explore each of these criteria.
Maximum tuning time
Previously, you had the option to define a maximum number of training jobs as a resource limit setting to control the tuning budget in terms of compute resource. However, this can lead to unnecessary longer or shorter training times than needed or desired.
With the addition of the maximum tuning time criteria, you can now allocate your training budget in terms of amount of time to run the tuning job and automatically terminate the job after a specified amount of time defined in seconds.
As seen above, we use the MaxRuntimeInSeconds to define the tuning time in seconds. Setting the tuning time limit helps you limit the duration of the tuning job and also the projected cost of the experiment.
The total cost before any contractual discount can be estimated with the following formula:
EstimatedComputeSeconds= MaxRuntimeInSeconds * MaxParallelTrainingJobs * InstanceCost
The max runtime in seconds could be used to bound cost and runtime. In other words, it’s a budget control completion criteria.
This feature is part of a resource control criteria and doesn’t take into account the convergence of the models. As we see later in this post, this criteria can be used in combination with other stopping criteria to achieve cost control without sacrificing accuracy.
Desired target metric
Another previously introduced criteria is to define the target objective goal upfront. The criteria monitors the performance of the best model based on a specific objective metric and stops tuning when the models reach the defined threshold in relation to a specified objective metric.
With the TargetObjectiveMetricValue criteria, we can instruct SageMaker to stop tuning the model after the objective metric of the best model has reached the specified value:
In this example, we are instructed SageMaker to stop tuning the model when the objective metric of the best model has reached 0.95.
This method is useful when you have a specific target that you want your model to reach, such as a certain level of accuracy, precision, recall, F1-score, AUC, log-loss, and so on.
A typical use case for this criteria would be for a user who is already familiar with the model performance at given thresholds. A user in the exploration phase may first tune the model with a small subset of a larger dataset to identify a satisfactory evaluation metric threshold to target when training with the full dataset.
This criteria monitors the models’ convergence after each iteration and stops the tuning if the models don’t improve after a defined number of training jobs. See the following configuration:
In this case we set the MaxNumberOfTrainingJobsNotImproving to 10, which means if the objective metric stops improving after 10 training jobs, the tuning will be stopped and the best model and metric reported.
Improvement monitoring should be used to tune a tradeoff between model quality and overall workflow duration in a way that is likely transferable between different optimization problems.
Convergence detection is a completion criteria that lets automatic model tuning decide when to stop tuning. Generally, automatic model tuning will stop tuning when it estimates that no significant improvement can be achieved. See the following configuration:
The criteria is best suited when you initially don’t know what stopping settings to select.
It’s also useful if you don’t know what target objective metric is reasonable for a good prediction given the problem and dataset in hand, and would rather have the tuning job complete when it is no longer improving.
Experiment with a comparison of completion criteria
In this experiment, given a regression task, we run 3 tuning experiments to find the optimal model within a search space of 2 hyperparameters having 200 hyperparameter configurations in total using the direct marketing dataset.
With everything else being equal, the first model was tuned with the BestObjectiveNotImproving completion criteria, the second model was tuned with the CompleteOnConvergence and the third model was tuned with no completion criteria defined.
When describing each job, we can observe that setting the BestObjectiveNotImproving criteria has led to the most optimal resource and time relative to the objective metric with significantly fewer jobs ran.
The CompleteOnConvergence criteria was also able to stop tuning halfway through the experiment resulting in fewer training jobs and shorter training time compared to not setting a criteria.
While not setting a completion criteria resulted in a costly experiment, defining the MaxRuntimeInSeconds as part of the resource limit would be one way of minimizing the cost.
The results above show that when defining a completion criteria, Amazon SageMaker is able to intelligently stop the tuning process when it detects that the model is less likely to improve beyond the current result.
Note that the completion criteria supported in SageMaker automatic model tuning are not mutually exclusive and can be used concurrently when tuning a model.
When more than one completion criteria is defined, the tuning job completes when any of the criteria is met.
For example, a combination of a resource limit criteria like maximum tuning time with a convergence criteria, such as improvement monitoring or convergence detection, may produce an optimal cost control and an optimal objective metrics.
In this post, we discussed how you can now intelligently stop your tuning job by selecting a set of completion criteria newly introduced in SageMaker, such as maximum tuning time, improvement monitoring, or convergence detection.
We demonstrated with an experiment that intelligent stopping based on improvement observation across iteration may lead to a significantly optimized budget and time management compared to not defining a completion criteria.
We also showed that these criteria are not mutually exclusive and can be used concurrently when tuning a model, to take advantage of both, budget control and optimal convergence.
For more details on how to configure and run automatic model tuning, refer to Specify the Hyperparameter Tuning Job Settings.
About the Authors
Doug Mbaya is a Senior Partner Solution architect with a focus in data and analytics. Doug works closely with AWS partners, helping them integrate data and analytics solutions in the cloud.
Chaitra Mathur is a Principal Solutions Architect at AWS. She guides customers and partners in building highly scalable, reliable, secure, and cost-effective solutions on AWS. She is passionate about Machine Learning and helps customers translate their ML needs into solutions using AWS AI/ML services. She holds 5 certifications including the ML Specialty certification. In her spare time, she enjoys reading, yoga, and spending time with her daughters.
Iaroslav Shcherbatyi is a Machine Learning Engineer at AWS. He works mainly on improvements to the Amazon SageMaker platform and helping customers best use its features. In his spare time, he likes to go to gym, do outdoor sports such as ice skating or hiking, and to catch up on new AI research.
Read MoreAWS Machine Learning Blog