You Can’t Fail To Learn A New Language With These AWS AI Services: Translate, Rekognition, Polly

With the improvement of AI, machines are doing more and more jobs that only humans did before. According to a report, robots will take 800 million jobs by 2030. It seems that tomorrow will be different and we should prepare our kids. According to Jack Ma, we should teach soft skills, sports, art and music to our kids instead of pouring information to them. And our kids should be a world citizen and I believe knowing many languages is very important for being a world citizen.

In April, I have talked at Serverless Turkey Meetup about AWS AI Services like Polly, Lex and Rekognition that are introduced at AWS re:Invent 2016 (slides here). While I was talking there, I have promised to write about services that are introduced at AWS re:Invent 2017 like Translate and Transcribe. And when I had time for preparing the blog post, I have started thinking a good use case about using Amazon Translate.

While I was thinking about different use cases, I have dreamed about preparing an e-learning application for my twins that is easy to use and effective so they can learn Spanish as a second language quickly. So I have prepared a web application that kids would use to translate the word that said, to take a pronunciation quiz and to read a hand-written text in a foreign language.

The application uses Amazon Polly for turning words into the speech in different languages. Amazon Translate is used for translation and Amazon Rekognition is used for reading texts in images.

Use cases for the demo app

Although I tried to use Amazon Transcribe for turning speech into the text, I have noticed that the service is not suitable for interactive real-time usage and decided to use Google Speech To Text API. I believe Transcribe service will be improved for real-time usage and it may be subject of another post.

I will show how to run this demo app step by step. Here are the requirements.

Requirements

1. Git

2. An AWS account

3. A Google Cloud account

4. A web server or AWS S3 bucket

Steps are below.

1. Get the code

2. Prepare AWS account

3. Prepare Google Cloud account

4. Run the app

Let’s start.

1. Get the code

Clone the GitHub repository.

git clone https://github.com/ceyhunozgun/awstranslator

2. Prepare AWS account

Cd into the project directory and create a text file named credentials.js. Put your AWS access key id and secret access key in it like below. You can create an access key for your AWS account here. The credentials should have permission for Polly, Translate and Rekognition services.

Please note that for this demo app, I have used credentials in JavaScript directly for simplicity. Never use them in production this way, to protect your credentials, use more secure ways like Amazon Cognito instead.

Set your AWS region accordingly.

3. Prepare Google Cloud account

To use Google Speech API, you should create an API Key, create a client id and enable Speech API. Then put your API key and client id like below. For a quick introduction to Google Speech API, you can look here.

4. Run the app

To run the application you should serve the HTML file from a web server as Google Speech API does not work on file:// urls. You can use a http server like MiniWeb on your computer or you can host the app from a S3 bucket.

If everything is ok, you can use the app for translating sentences, taking a pronunciation quiz and read the text in pictures as shown in the video below.

Demo video of the app

How does it work?

After watching and playing with the app, you can find the steps for developing the features below.

AWS JavaScript SDK

For this app, I have used AWS JavaScript SDK to access AWS services from the browser. AWS Translate is not included in the default build. To use these services in the browser we can use SDK Builder. Go to SDK Builder site, Select Translate and click ‘Build’ to download your customized AWS JS SDK.

Initializing AWS Services

To use AWS services, you should initialize them by calling the methods shown below.

Initializing Google Services

Google API is initialized by using the functions shown below.

Using AWS Translate API

For translating text from one language to another, AWS Translate API TranslateText method is used like below.

Translating Speech To Text Using Google Speech to Text API

After the audio data is captured in the browser, the speech is turned into the text using Google Speech API recognize method.

Reading Text in Images Using Amazon Rekognition

When we click the video in the app, a picture is taken and sent to the Amazon Rekognition DetectText method as shown below.

Converting Text To Speech Using Amazon Poly

To read aloud a text in a specified language, Amazon Polly SynthesizeSpeech method is used.

Summary

In this post, I have shown how easy is to use AWS AI services like Amazon Translate, Polly and Rekognition for learning new languages.

You can find the code here.

My other post on AWS Rekognition, Lex and Polly is below:

· Serverless Allergy Checker with Amazon Rekognition, Lex, Polly, DynamoDB, S3 and Lambda

I will continue to write about AWS AI services.

If you liked this post, please share or clap.

Also, I would like to hear your comments about different use cases of AWS AI services.