71

Presidential Dad Jokes with Azure Custom Voice service

 4 years ago
source link: https://www.tuicool.com/articles/QZR7BnU
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

I went recently to Hack The North in Waterloo, Ontario. Way up there in Canada. Land of the maple syrup and overtly polite folks. It was a very intense moment of pure creativity shared with over 1,500 students. I saw many projects, but one caught my eye. It was a funny joke, but it sparked some ideas.

The project recognized your emotion through Azure Cognitive Services Face API , then, from a list of prewritten jokes (or insults), tell you a joke (or insult). What’s the twist? They were using Trump’s very distinctive voice. Politics aside, my first question was, how did they make it ? I’ve then on set on a path to reproduce what they did. At the very least, reproduce the custom voice.

Why would I want to do that? Custom voices for which you can use as a Text-To-Speech service makes for automation potential. Whether it is for phone assistant, personal assistant, bots, or jokes, having a custom voice that can tell you how bad you are at keeping up with memes and social media is very tempting.

I didn’t need any more reason to start building one. Let me show you how.

Setting a goal

The Hackathon team created a voice copying President Trump. I have no idea where they took their data, but I needed a way to get similar data. Two things are necessary when building a custom voice model.

  • Audio file (mp3/wav)
  • Matching transcript

The good news is that a President’s speeches are recorded and transcripted pretty thoroughly by multiple people. A small search for president speech recording led me to the American Rhetoric website and their online speech bank. From there, I saw that they had some President Obama speeches. Not only some speeches but speeches since before he ran for president.

Those speeches also include full transcripts.

Our goal? Train a model that will be able to reproduce President Obama’s voice.

Downloading the training data

Go and download a few Obama Speeches as MP3s. Ensure that the speeches that you pick only have a single speaker (President Obama). Otherwise, the model won’t work correctly.

Create a separate .txt file in UTF-8 with BOM with the same name as the MP3 file.

getting-data.png

Then, we’re going to create two zip files. One with all the audio files named PresidentialFun-Audio.zip and another with all the transcripts called PresidentialFun-Transcript.zip .

We’re going to need those later.

Creating the necessary resources

You will need a free Azure account before you go further. Then, we’ll need to create a Speech service within the Azure portal.

step1.png

For this specific scenario, we’re going to choose a name, a location, and a pricing tier of S0 . We could go free, but the feature that we need is only available for the non-free offering.

step2.png

Once we have the resource created, we need to go to the Custom Voice website and start building our project.

Click on Custom Voice.

step3.png

Then, we’re going to create a new project.

It is essential at that point that we select Male and English (United States) . We’re going to leave that checkbox unchecked for now.

step4.png

Uploading the data

Start by clicking on the project you created. You will be taken to the Data tab. Click on Upload data

step5.png

In the next window, we’re going to choose Long audio + transcript . Click Next .

step6.png

Pick a name for your dataset and click Next .

step7.png

That is where our previously created zip files come in handy. Browse for PresidentialFun-Audio.zip for the Recordings , and PresidentialFun-Transcript.zip for the Transcripts section. Make sure to tick the checkbox and click Upload.

After that, all the speeches will be cut into utterances automatically. Once we parsed the data successfully, we are ready to train the model.

Warning: This may take an hour or two, depending on your dataset.

Training the model

Once the data step is in a Succeeded state, we’re ready to start training the model. Go to the Training tab, click Train model .

step8.png

Follow the steps that will allow you to pick a name for your model, as well as the data you want to use to train it. Review your options before clicking Train .

step9.png

From that point on, it will take a while. For me, this meant 18 hours of processing. It will incur fees at around $1 per hour. The more clean utterances that you have, the more time the model will take to train.

Testing the model

With your completed model, it’s time to start testing. It is time to have fun.

Let’s say that I that point, I went all-in on Dad Jokes.

step10.png

Here’s a sample.

What’s next

At that point, we can either keep on refining the model with more data or, if we’re happy with what we got, deploy it and have it available through an API.

If you happen to try this, please send me your best dad jokes with your favorite celebrity!


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK