4

How to Do Speech Recognition in Python

 2 years ago
source link: https://hackernoon.com/how-to-do-speech-recognition-in-python-bk1234w9
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

How to Do Speech Recognition in Python

6
heart.pngheart.pngheart.pngheart.png
light.pnglight.pnglight.pnglight.png
boat.pngboat.pngboat.pngboat.png
money.pngmoney.pngmoney.pngmoney.png

@miketechgameMike Wolfe

Software Developer, Tech Enthusiast, Runner.

In my free time, I am attempting to build my own smart home devices. One feature they will need is speech recognition. While I am not certain yet as to how exactly I want to implement that feature, I thought it would be interesting to dive in and explore different options. The first I wanted to try was the SpeechRecognition library.

0 reactions
heart.png
light.png
money.png
thumbs-down.png

Why the hard way?

To put a long story short, this tutorial is going to be a little bit different. There were several errors I had to deal with and even redirect my focus. That being said, the coding portion is simple. Only a few lines of code to get it working. The installation took time and effort, but with research it was manageable. Instead, the issue was in the systems I was deciding to use. For example, the first attempt was on an Ubuntu server. Nothing wrong with that, but the default device could not be changed, seeing as the code is being run through ssh. I would have to go to the server and plug everything indirectly. Again, nothing wrong with that. I was just hoping for an easier option. Feeling particularly adventurous and being a little too lazy to plug into the server directly, I tried a few different machines.

0 reactions
heart.png
light.png
money.png
thumbs-down.png

This tutorial will be a little different than my previous posts. For this, I am first going to share the working installation code on a local Ubuntu machine, which is what I ended up using. After that, I will talk about the other machines I attempted, where I found my issues, and when I decided to switch. Hopefully, that will help anyone using other machines. Or perhaps someone will know more about the issues I encountered but did not take the time to see through.

0 reactions
heart.png
light.png
money.png
thumbs-down.png

Installing SpeechRecognition

To run the SpeechRecognition library for our code, we will first need to install SpeechRecognition but then must also install PyAudio. First, we will start with the main package:

0 reactions
heart.png
light.png
money.png
thumbs-down.png
sudo pip3 install SpeechRecognition

If your try to run code now, you will get an error about the PyAudio installation not being found. Installing should have followed exactly the same format, but it seems I was missing packages to get this to work properly, and attempting to install PyAudio threw an error. These packages should remove that error. I did not have to update apt at that point, but it does not hurt to give it an update first.

0 reactions
heart.png
light.png
money.png
thumbs-down.png
sudo apt-get install libasound-dev portaudio19-dev libportaudio2
libportaudiocpp0

With that out of the way, you should be good to install PyAudio:

0 reactions
heart.png
light.png
money.png
thumbs-down.png
sudo pip3 install PyAudio

Coding the Speech Recognizer

As mentioned previously, there are very few lines of code required to get this up and running. different machines.

0 reactions
heart.png
light.png
money.png
thumbs-down.png

First, you must import the SpeechRecognition library:

0 reactions
heart.png
light.png
money.png
thumbs-down.png
import speech_recognition as speech

We added an alias to the library in order to reference it later in a simpler way. Now, we can use the Recognizer function:

0 reactions
heart.png
light.png
money.png
thumbs-down.png
sound = speech.Recognizer()

Next, we will need to allow the python file to hear what we are saying. It is the reason we needed PyAudio as well. For live speech, we will need to set up a microphone. Note, we will not set this in a loop, so we will only be able to speak to the application one time, whether that is a single word or a sentence. Nonetheless, this recognizer is only a test, so we will not need to speak multiple times. We will set up a microphone first, give that an alias, then instructs the Recognizer to set it up earlier to listen.

0 reactions
heart.png
light.png
money.png
thumbs-down.png
with speech.Microphone() as audio:
     said = sound.listen(audio)

Now, because our microphone could be unclear, or even the speech itself, we will need to set up a “try” to determine if the Recognizer was able to understand or not. We will use a recognize_google function, so an internet connection will be required. For security's sake, I would not use this function in any home applications. However, while just testing what Python can do, it will be good enough for now. The parameters will need what was said to recognize, the language, and whether all guesses should be displayed or not.

0 reactions
heart.png
light.png
money.png
thumbs-down.png

At this time, we want to see all potential guesses, and the language will be English. Either of these could be different for you, which is why they are specified. If it did recognize the phrase, then we want to print the results. However, if it could not understand, we will want to print a message. This can be done with an “except” which will track any errors encountered, and we can leave an error that states the speech was not understood.

0 reactions
heart.png
light.png
money.png
thumbs-down.png
try:
     print(sound.recognize_google(said, language = 'en-IN',
     show_all = True))
except LookupError:
     print("Could not understand. Please repeat.")

Now, all you must do is run the application.

0 reactions
heart.png
light.png
money.png
thumbs-down.png

With our code up and running, we can now talk about what gave me issues on different machines.

0 reactions
heart.png
light.png
money.png
thumbs-down.png

Working on An Ubuntu Server

As mentioned before, the issue was that from a separate machine connect to the server, I was unable to change the default input device. This would not have been an issue if I would have gone to the server and plugged in the microphone directly. Other than the issue with the input, the installation process was the same, as it was also Ubuntu 16.04.

0 reactions
heart.png
light.png
money.png
thumbs-down.png

Working over WSL

The next system I used was a Windows machine running WSL (Windows Subsystem for Linux). It too used Ubuntu 16.04, so the installation process was the same. However, when it came to using the microphone, WSL is not as easy as plugin and go. To control the microphone over the Ubuntu terminal app, PulseAudio needed to be installed. To do this, first, the repository was added:

0 reactions
heart.png
light.png
money.png
thumbs-down.png
sudo add-apt-repository ppa:therealkense/wsl-pulseaudio

From there, a regular install could be run:

0 reactions
heart.png
light.png
money.png
thumbs-down.png
sudo apt-get install pulseaudio

PulseAudio is a network-based sound server, which runs on Linux and other variations. Like other systems, you must start it and check the status. First, there is a command to restart it:

0 reactions
heart.png
light.png
money.png
thumbs-down.png
pulseaudio --k

If not already on, you can now start PulseAudio:

0 reactions
heart.png
light.png
money.png
thumbs-down.png
pulseaudio --start

Next, you will have to look at the audio devices available. These devices are known as sinks. In my case, I had only one, which was the headset with a microphone:

0 reactions
heart.png
light.png
money.png
thumbs-down.png
pacmd list-sinks

Now we also have the index, which is what we needed. We can set the default input from here:

0 reactions
heart.png
light.png
money.png
thumbs-down.png
pacmd set-default-sink 0

Please note, you may have to run the start command again on PulseAudio. I had to run it for every command. This was the final step. Now the code should run. However, when running the code there is yet another error. It is a lengthy description, but the main error is:

0 reactions
heart.png
light.png
money.png
thumbs-down.png

I dug in to find more about this error, although it did not seem to have much documentation behind it. Instead, it seemed as if StackOverflow was one of the only sites I found with usable information on it. It seems like others were having the same issue. This is where I stopped for this implementation.

0 reactions
heart.png
light.png
money.png
thumbs-down.png

Looking back now, it seems like someone had mentioned using XServer. I am wondering now if I would have run Xming, maybe that would have worked. But, oh well. Another time perhaps I will give it a go. Leaving this version, I moved to my next machine.

0 reactions
heart.png
light.png
money.png
thumbs-down.png

Working on Fedora

As usual, the very first thing to do was install SpeechRecognition via pip3. Upon trying to install PyAudio, it is important to note that still had prerequisites to install, but they are different in Fedora. Remember that Fedora syntax is different than Ubuntu:

0 reactions
heart.png
light.png
money.png
thumbs-down.png
sudo dnf install portaudio-devel redhat-rpm-config

This is not the only package required. We must also install the python portion of devel:

0 reactions
heart.png
light.png
money.png
thumbs-down.png
sudo dnf install python3-devel

Now the prerequisites were installed, go ahead and install PyAudio via pip3 just like on the previous machines. With everything installed, I ran the code. Another error:

0 reactions
heart.png
light.png
money.png
thumbs-down.png

This error seemed to be a little more complicated to get information on. Some people were thinking it just needed a restart, some people never got it working. In either case, it was difficult to find documentation.

0 reactions
heart.png
light.png
money.png
thumbs-down.png

Now, if I would have tried harder, maybe looked longer, or even just dedicated a little bit more effort to this, it is probably simple enough to solve. However, this was just an experimental project. For an experiment, I was not wanting to dedicate too much time to this.

0 reactions
heart.png
light.png
money.png
thumbs-down.png

So, this is where I stopped trying on Fedora. Perhaps for the better, as I realized I have an Ubuntu machine, could just run that code locally. And so that is what I ended up doing and had no issues with that.

0 reactions
heart.png
light.png
money.png
thumbs-down.png

Conclusion

At the end of the day, we got something up and working. I think it was rather interesting to mess around with. The current code we created would be used only for test purposes, however, as the microphone is making a call to google. We would not want to be using google calls for any applications intended to be used for privacy reasons.

0 reactions
heart.png
light.png
money.png
thumbs-down.png

As a difference, we talked about the errors I came across on different machines. Although they are likely solvable, I did not dedicate too much time to solving these, and therefore some were left unresolved. In the long run, we did get the code up and running. Either way, it was an interesting journey, and our voices were recognized by a python library!

0 reactions
heart.png
light.png
money.png
thumbs-down.png

Hopefully, you will find some use in seeing the spots where I went wrong. Yes, the mistakes are frustrating, and roadblocks are as well. However, every mistake can be insightful. We learned what to do, what not to do, where to go when stuck, and even when to just move on if able. Noting the differences in installing certain packages on Ubuntu versus others in Fedora was the most interesting portion in my opinion. It took a little research, but nothing was more than we could handle. So, I thank you for joining this voice recognition adventure with me. Until next time, cheers!

0 reactions
heart.png
light.png
money.png
thumbs-down.png

Previously published at https://python.plainenglish.io/speechrecognition-in-python-df4e56fecf51

0 reactions
heart.png
light.png
money.png
thumbs-down.png
6
heart.pngheart.pngheart.pngheart.png
light.pnglight.pnglight.pnglight.png
boat.pngboat.pngboat.pngboat.png
money.pngmoney.pngmoney.pngmoney.png
by Mike Wolfe @miketechgame. Software Developer, Tech Enthusiast, Runner.Read my stories
Join Hacker Noon

Create your free account to unlock your custom reading experience.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK