Amazon Translate is a neural machine translation service that delivers fast, high-quality, and affordable language translation in 71 languages and 4,970 language pairs. Amazon Translate is great for performing batch translation when you have large quantities of pre-existing text to translate and real-time translation when you want to deliver on-demand translations of content as a feature of your applications. It can also handle documents that are written in multiple languages.
Document automation is a common use case where machine learning (ML) can be applied to simplify storing, managing, and extracting insights from documents. In this post, we look at how to run batch translation jobs using the Boto3 Python library as run from an Amazon SageMaker notebook instance. You can also generalize this process to run batch translation jobs from other AWS compute services.
Roles and permissions
We start by creating an AWS Identity and Access Management (IAM) role and access policy to allow SageMaker to run batch translation jobs. If you’re using a simple text translation (such as under 5,000 bytes), the job is synchronous and the data is passed to Amazon Translate as bytes, However, when run as a batch translation job where files are accessed from an Amazon Simple Storage Service (Amazon S3) bucket, the data is read directly by Amazon Translate instead of being passed as bytes by the code run in the SageMaker notebook (in case of shorter text strings).
This section creates the permissions need to allow Amazon Translate access the S3 files.
On the IAM console, choose Roles.
Choose Create a role.
Choose AWS service as your trusted entity.
For Common use cases, choose EC2 or Lambda (for this post, we choose Lambda).
Choose Next: Permissions.
For this post, we create a policy that’s not too open.
Choose Create policy.
On the JSON tab, enter the following policy code, which for this post is named policy-rk-read-write (also provide the name of the bucket containing the translated files):
On the Create role page, attach your new policy to the role.
For Role name, enter a name (for this post, we name it translates3access2).
Choose Create role.
So far everything you have done is a common workflow; now we make a change that allows Amazon Translate to have that trust relationship.
On the IAM console, choose the role you just created.
On the Trust relationships tab, choose Edit trust relationship.
In the Service section, replace the service name with translate.
For example, the following screenshot shows the code with Service defined as lambda.amazonaws.com.
The following screenshot shows the updated code as translate.amazonaws.com.
Choose Update Trust Policy.
Use a SageMaker notebook with Boto3
We can now run a Jupyter notebook on SageMaker. Every notebook instance has an execution role, which we use to grant permissions for Amazon Translate. If you’re performing a synchronous translation with a short text, all you need to do is provide TranslateFullAccess to this role. In production, you can narrow down the permissions with granular Amazon Translate access.
On the SageMaker console, choose the notebook instance you created.
In the Permissions and encryption section, choose the role.
Choose Attach policies.
Search for and choose TranslateFullAccess.
If you haven’t already configured this role to have access to Amazon S3, you can do so following the same steps.
You can also choose to give access to all S3 buckets or specific S3 buckets when you create a SageMaker notebook instance and create a new role.
For this post, we attach the AmazonS3FullAccess policy to the role.
Run an Amazon Translate synchronous call
You can now run a simple synchronous Amazon Translation job on your SageMaker notebook.
Run an Amazon Translate asynchronous call
If you try to run a batch translation job using Boto3 as in the following screenshot, you have a parameter called DataAccessRoleArn. This is the SageMaker execution role we identified earlier; we need to be able to pass this role to Amazon Translate, thereby allowing Amazon Translate to access data in the S3 bucket. We can configure this on the console, wherein the role is directly passed to Amazon Translate instead of through code run from a SageMaker notebook.
You first need to locate your role ARN.
On the IAM console, choose the role you created (translates3access2).
On the Summary page, copy the role ARN.
Create a new policy (for this post, we call it IAMPassPolicyTranslate).
Enter the following JSON code (provide your role ARN):
You can skip the tags section and choose Next
Provide a name for the policy (for this post, we name it IAMPassPolicyTranslate).
This policy can now pass the translates3access2 role.
The next step is to attach this policy to the SageMaker execution role.
Choose the execution role.
Choose Attach policies.
Attach the policy you just created (IAMPassPolicyTranslate).
You can now run the code in the SageMaker notebook instance.
You have seen how to run batch jobs using Amazon Translate in a SageMaker notebook. You can easily apply the same process to running the code using Amazon Elastic Container Service (Amazon ECS), Amazon Elastic Kubernetes Service (Amazon EKS), Amazon Elastic Compute Cloud (Amazon EC2), or other services. You can also as a next step combine services like Amazon Comprehend, Amazon Transcribe, or Amazon Kendra to automate managing, searching, and adding metadata to your documents or textual data.
For more information about Amazon Translate, see Amazon Translate resources.
About the Authors
Raj Kadiyala is an AI/ML Tech Business Development Manager in AWS WWPS Partner Organization. Raj has over 12 years of experience in Machine Learning and likes to spend his free time exploring machine learning for practical every day solutions and staying active in the great outdoors of Colorado.
Watson G. Srivathsan is the Sr. Product Manager for Amazon Translate, the AWS natural language processing service. On weekends you will find him exploring the outdoors in the Pacific Northwest.
Read MoreAWS Machine Learning Blog