To train a machine learning (ML) model, you need a large, high-quality, labeled dataset. Amazon SageMaker Ground Truth helps you build high-quality training datasets for your ML models. With Ground Truth, you can use workers from either Amazon Mechanical Turk, a vendor company of your choosing, or an internal, private workforce to enable you to create a labeled dataset. You can use the labeled dataset output from Ground Truth to train your own models. You can also use the output as a training dataset for an Amazon SageMaker model.
With Ground Truth, you can create a private workforce of employees or contractors to handle your data within your organization. This enables customers who want to keep their data within their organization to use a private workforce to support annotation workloads containing sensitive business data or personal identifiable information (PII) that can’t be handled by external parties. Alternately, if data annotation requires domain-specific subject matter expertise, you can use a private workforce to route tasks to employees, contractors, or third-party annotators with that specific domain knowledge. This workforce can be employees in your company or third-party workers who have domain and industry knowledge of your datasets. For example, if the task is to label medical images, you could create a private workforce of people knowledgeable about the images in question.
You can configure a private workforce to authenticate using OpenID Connect (OIDC) with your Identity Provider (IdP). In this post, we demonstrate how to configure OIDC with on-premises Active Directory using Active Directory Federation Service (ADFS). Once the configuration is set up, you can configure and manage work teams, track worker performance, and set up notifications when labeling tasks are available in Ground Truth.
Solution overview
When you use existing on-premises Active Directory credentials to authenticate your private workforce, you don’t need to worry about managing multiple identities in different environments. Workers use existing Active Directory credentials to federate to your labeling portal.
Prerequisites
Make sure you have the following prerequisites:
A registered public domain
An existing or newly deployed ADFS environment
An AWS Identity and Access Management (IAM) user with permissions to run SageMaker API operations
Additionally, make sure you use Ground Truth in a supported Region.
Configure Active Directory
The Ground Truth private workforce OIDC configuration requires sending a custom claim sagemaker:groups to Ground Truth from your IdP.
Create an AD group named sagemaker (be sure to use all lower-case).
Add the users that will form your private workforce to this group.
Configure ADFS
The next step is to configure an ADFS application with specific claims that Ground Truth uses to obtain Issuer, ClientId, and ClientSecret, and other optional claims from your IdP to authenticate workers by obtaining an authentication code from the configured AuthorizationEndpoint in your IdP.
For more information about the claims your IdP sends to Ground Truth, refer to Send Required and Optional Claims to Ground Truth and Amazon A2I.
Create Application Group
To create your application group, complete the following steps:
Open the ADFS Management Console
Change the ADFS Federation Service Identifier from https://${HostName}/adfs/service/trust to https://${HostName}/adfs
Choose Application Group, right-click, and choose Add Application Group.
Enter a name (for example, SageMaker Ground Truth Workforce) and description.
Under Template, for Client-Server applications, choose Server application accessing a web API.
Choose Next.
Copy and save the client ID for future reference.
For Redirect URI, use a placeholder such as https://privateworkforce.local.
Choose Add, then choose Next.
Select Generate a shared secret and save the generated value for later use, then choose Next.
In Configure Web API section, enter the client ID obtained earlier.
Choose Add, then choose Next.
Select Permit everyone under Access Control Policy, then choose Next.
Under Permitted scopes, select openid, then choose Next.
Review the configuration information, then choose Next and Close.
Configure claim descriptions
To configure your claim descriptions, complete the following steps:
In the ADFS Management Console, expand Service Section.
Right-click Claim Description and choose Add Claim Description.
For Display name, enter SageMaker Client ID.
For Short Name, enter sagemaker:client_id.
For Claim identifier, enter sagemaker:client_id.
Select the options to publish the claim to federation metadata for both accept and send.
Choose OK.
Repeat these steps for the remaining claim groups (Sagemaker Name, Sagemaker Sub, and Sagemaker Groups), as shown in the following screenshot.
Note that your claim identifier is listed as Claim Type.
Configure the application group claim rules
To configure your application group claim rules, complete the following steps:
Choose Application Groups, then choose the application group you just created.
Under Web API, choose the name shown, which opens the Web API properties.
Choose the Issuance Transform Rules tab and choose Add Rule.
Choose Transform an Incoming Claim and provide the following information:
For Claim rule name, enter sagemaker:client_id.
For Incoming claim type, choose OAuth Client Id.
For Outgoing claim type, choose the claim SageMaker Client ID.
Leave other values as default.
Choose Finish.
Choose Add New Rule.
Choose Transform an Incoming Claim and provide the following information:
For Claim rule name, enter sagemaker:sub.
For Incoming claim type, choose Primary SID.
For Outgoing claim type, choose the claim Sagemaker Sub.
Leave other values as default.
Choose Finish.
Choose Add New Rule.
Choose Transform an Incoming Claim and provide the following information:
For Claim rule name, choose sagemaker:name.
For Incoming claim type, choose Name.
For Outgoing claim type, choose the claim Sagemaker Name.
Leave other values as default.
Choose Finish.
Choose Add New Rule.
Choose Send Group Membership as a Claim and provide the following information:
For Claim rule name, enter sagemaker:groups.
For User’s group, choose the sagemaker AD group created earlier.
For Outgoing claim type, choose the claim Sagemaker Groups.
For Outgoing claim value, enter sagemaker.
Choose Finish.
Choose Apply and OK.
You should have four rules, as shown in the following screenshot.
Create and configure an OIDC IdP workforce using the SageMaker API
In this step, you create a workforce from the AWS Command Line Interface (AWS CLI) using an IAM user or role with appropriate permissions.
Run the following AWS CLI command to create a private workforce. The oidc-config parameter contains information you must obtain from the IdP. Provide the appropriate values that you obtained from your IdP:
client_id is the client ID, and client_secret is the client secret you obtained when creating your application group.
You can reconstruct AuthorizationEndpoint, TokenEndpoint, UserInfoEndpoint, LogoutEndpoint, and JwksUri by replacing only the sts.example.com portion with your ADFS endpoint.
The preceding command should successfully return the WorkforceArn. Save this output for reference later.
Use the following code to describe the created workforce to get the SubDomain.
We use this to configure the redirect URI in ADFS. After Ground Truth authenticates a worker, this URI redirects the worker to the worker portal where the workers can access labeling or human review tasks.
Copy the SubDomain and append /oauth2/idpresponse to the end. For example, it should look like https://drxxxxxlf0.labeling.us-east-1.sagemaker.aws/oauth2/idpresponse.You use this URL to update the redirect URI in ADFS.
Choose the application you created earlier (SageMaker Ground Truth Private Workforce).
Choose the name under Server application.
Select the placeholder URL used earlier and choose Remove.
Enter the appended SubDomain value.
Choose Add.
Choose OK twice.
Validate the OIDC IdP workforce authentication response
Now that you have configured OIDC with your IdP, it’s time to validate the authentication workflow using curl.
Replace the placeholder values with your information, then enter the modified URI in your browser:
You should be prompted to log in with AD credentials. You may receive a 401 Authorization Required error.
Copy the code parameter from the browser query and use it to perform a curl with the following command. The portion you need to copy starts with code=. Replace this code with code you copied. Also, don’t forget to change the values of url, client_id, client_secret, and redirect_uri:
url is the token endpoint from ADFS.
client_id is the client ID from the application group in ADFS.
client_secret is the client secret from ADFS.
After making the appropriate modifications, copy the entire command and run it from a terminal.
The output of the command generates an access token in JWT format.
Copy this output to the encoded box and decode it with JWT.
The decoded message should contain the required claims you configured. If the claims are present, proceed to the next step; if not, ensure you have followed all the steps outlined so far.
From the output obtained in the preceding step, run the following command from a terminal after making necessary modifications. Replace the value for Bearer with the access_token obtained in the preceding command’s output and the userinfo with your own.
The output from this command may look similar to following code:
Now that you have successfully validated your OIDC configuration, it’s time to create the work teams.
Create a private work team
To create a private work team, complete the following steps:
On the Ground Truth console, choose Labeling workforces.
Select Private.
In the Private teams section, select Create private team.
In the Team details section, enter a team name.
In the Add workers section, enter the name of a single user group.
All workers associated with this group in your IdP are added to this work team.
To add more than one user group, choose Add new user group and enter the names of the user groups you want to add to this work team. Enter one user group per line.
Optionally, for Ground Truth labeling jobs, if you provide an email for workers in your JWT, Ground Truth notifies workers when a new labeling task is available if you select an Amazon Simple Notification Service (Amazon SNS) topic.
Choose Create private team.
Test access to the private labeling portal
To test your access, browse to https://console.aws.amazon.com/sagemaker/groundtruth#/labeling-workforces and open the labeling portal sign-in URL in a new browser window or incognito mode.
Log in with your IdP credentials. If authentication is successful, you should be redirected to the portal.
Cost
You will be charged for the number of jobs labeled by your internal employees. For more information, refer to Amazon SageMaker Data Labeling Pricing.
Clean up
You can delete the private workforce using the SageMaker API, DeleteWorkforce. If you have work teams associated with the private workforce, you must delete them before deleting the work force. For more information, see Delete a work team.
Summary
In this post, we demonstrated how to configure an OIDC application with Active Directory Federation Services and use your existing Active Directory credentials to authenticate to a Ground Truth labeling portal.
We’d love to hear from you. Let us know what you think in the comments section.
About the authors
Adeleke Coker is a Global Solutions Architect with AWS. He works with customers globally to provide guidance and technical assistance in deploying production workloads at scale on AWS. In his spare time, he enjoys learning, reading, gaming and watching sport events.
Aishwarya Kaushal is a Senior Software Engineer at Amazon. Her focus is on solving challenging problems using machine learning, building scalable AI solutions using distributed systems and helping customers to adopt the new features/products. In her spare time, Aishwarya enjoys watching sci-fi movies, listening to music and dancing.
Read MoreAWS Machine Learning Blog