How to Do Speech Recognition in Python

May 21st 2021 new story

@miketechgameMike Wolfe

Software Developer, Tech Enthusiast, Runner.

In my free time, I am attempting to build my own smart home devices. One feature they will need is speech recognition. While I am not certain yet as to how exactly I want to implement that feature, I thought it would be interesting to dive in and explore different options. The first I wanted to try was the SpeechRecognition library.

0 reactions

Why the hard way?

To put a long story short, this tutorial is going to be a little bit different. There were several errors I had to deal with and even redirect my focus. That being said, the coding portion is simple. Only a few lines of code to get it working. The installation took time and effort, but with research it was manageable. Instead, the issue was in the systems I was deciding to use. For example, the first attempt was on an Ubuntu server. Nothing wrong with that, but the default device could not be changed, seeing as the code is being run through ssh. I would have to go to the server and plug everything indirectly. Again, nothing wrong with that. I was just hoping for an easier option. Feeling particularly adventurous and being a little too lazy to plug into the server directly, I tried a few different machines.

0 reactions

This tutorial will be a little different than my previous posts. For this, I am first going to share the working installation code on a local Ubuntu machine, which is what I ended up using. After that, I will talk about the other machines I attempted, where I found my issues, and when I decided to switch. Hopefully, that will help anyone using other machines. Or perhaps someone will know more about the issues I encountered but did not take the time to see through.

0 reactions

Installing SpeechRecognition

To run the SpeechRecognition library for our code, we will first need to install SpeechRecognition but then must also install PyAudio. First, we will start with the main package:

0 reactions

sudo pip3 install SpeechRecognition

If your try to run code now, you will get an error about the PyAudio installation not being found. Installing should have followed exactly the same format, but it seems I was missing packages to get this to work properly, and attempting to install PyAudio threw an error. These packages should remove that error. I did not have to update apt at that point, but it does not hurt to give it an update first.

0 reactions

sudo apt-get install libasound-dev portaudio19-dev libportaudio2
libportaudiocpp0

With that out of the way, you should be good to install PyAudio:

0 reactions

sudo pip3 install PyAudio

Coding the Speech Recognizer

As mentioned previously, there are very few lines of code required to get this up and running. different machines.

0 reactions

First, you must import the SpeechRecognition library:

0 reactions

import speech_recognition as speech

We added an alias to the library in order to reference it later in a simpler way. Now, we can use the Recognizer function:

0 reactions

sound = speech.Recognizer()

Next, we will need to allow the python file to hear what we are saying. It is the reason we needed PyAudio as well. For live speech, we will need to set up a microphone. Note, we will not set this in a loop, so we will only be able to speak to the application one time, whether that is a single word or a sentence. Nonetheless, this recognizer is only a test, so we will not need to speak multiple times. We will set up a microphone first, give that an alias, then instructs the Recognizer to set it up earlier to listen.

0 reactions

with speech.Microphone() as audio:
     said = sound.listen(audio)

Now, because our microphone could be unclear, or even the speech itself, we will need to set up a “try” to determine if the Recognizer was able to understand or not. We will use a recognize_google function, so an internet connection will be required. For security's sake, I would not use this function in any home applications. However, while just testing what Python can do, it will be good enough for now. The parameters will need what was said to recognize, the language, and whether all guesses should be displayed or not.

0 reactions

At this time, we want to see all potential guesses, and the language will be English. Either of these could be different for you, which is why they are specified. If it did recognize the phrase, then we want to print the results. However, if it could not understand, we will want to print a message. This can be done with an “except” which will track any errors encountered, and we can leave an error that states the speech was not understood.

0 reactions

try:
     print(sound.recognize_google(said, language = 'en-IN',
     show_all = True))
except LookupError:
     print("Could not understand. Please repeat.")

Now, all you must do is run the application.

0 reactions

With our code up and running, we can now talk about what gave me issues on different machines.

0 reactions

Working on An Ubuntu Server

As mentioned before, the issue was that from a separate machine connect to the server, I was unable to change the default input device. This would not have been an issue if I would have gone to the server and plugged in the microphone directly. Other than the issue with the input, the installation process was the same, as it was also Ubuntu 16.04.

0 reactions

Working over WSL

The next system I used was a Windows machine running WSL (Windows Subsystem for Linux). It too used Ubuntu 16.04, so the installation process was the same. However, when it came to using the microphone, WSL is not as easy as plugin and go. To control the microphone over the Ubuntu terminal app, PulseAudio needed to be installed. To do this, first, the repository was added:

0 reactions

sudo add-apt-repository ppa:therealkense/wsl-pulseaudio

From there, a regular install could be run:

0 reactions

sudo apt-get install pulseaudio

PulseAudio is a network-based sound server, which runs on Linux and other variations. Like other systems, you must start it and check the status. First, there is a command to restart it:

0 reactions

pulseaudio --k

If not already on, you can now start PulseAudio:

0 reactions

pulseaudio --start

Next, you will have to look at the audio devices available. These devices are known as sinks. In my case, I had only one, which was the headset with a microphone:

0 reactions

pacmd list-sinks

Now we also have the index, which is what we needed. We can set the default input from here:

0 reactions

pacmd set-default-sink 0

Please note, you may have to run the start command again on PulseAudio. I had to run it for every command. This was the final step. Now the code should run. However, when running the code there is yet another error. It is a lengthy description, but the main error is:

0 reactions

I dug in to find more about this error, although it did not seem to have much documentation behind it. Instead, it seemed as if StackOverflow was one of the only sites I found with usable information on it. It seems like others were having the same issue. This is where I stopped for this implementation.

0 reactions

Looking back now, it seems like someone had mentioned using XServer. I am wondering now if I would have run Xming, maybe that would have worked. But, oh well. Another time perhaps I will give it a go. Leaving this version, I moved to my next machine.

0 reactions

Working on Fedora

As usual, the very first thing to do was install SpeechRecognition via pip3. Upon trying to install PyAudio, it is important to note that still had prerequisites to install, but they are different in Fedora. Remember that Fedora syntax is different than Ubuntu:

0 reactions

sudo dnf install portaudio-devel redhat-rpm-config

This is not the only package required. We must also install the python portion of devel:

0 reactions

sudo dnf install python3-devel

Now the prerequisites were installed, go ahead and install PyAudio via pip3 just like on the previous machines. With everything installed, I ran the code. Another error:

0 reactions

This error seemed to be a little more complicated to get information on. Some people were thinking it just needed a restart, some people never got it working. In either case, it was difficult to find documentation.

0 reactions

Now, if I would have tried harder, maybe looked longer, or even just dedicated a little bit more effort to this, it is probably simple enough to solve. However, this was just an experimental project. For an experiment, I was not wanting to dedicate too much time to this.

0 reactions

So, this is where I stopped trying on Fedora. Perhaps for the better, as I realized I have an Ubuntu machine, could just run that code locally. And so that is what I ended up doing and had no issues with that.

0 reactions

Conclusion

At the end of the day, we got something up and working. I think it was rather interesting to mess around with. The current code we created would be used only for test purposes, however, as the microphone is making a call to google. We would not want to be using google calls for any applications intended to be used for privacy reasons.

0 reactions

As a difference, we talked about the errors I came across on different machines. Although they are likely solvable, I did not dedicate too much time to solving these, and therefore some were left unresolved. In the long run, we did get the code up and running. Either way, it was an interesting journey, and our voices were recognized by a python library!

0 reactions

Hopefully, you will find some use in seeing the spots where I went wrong. Yes, the mistakes are frustrating, and roadblocks are as well. However, every mistake can be insightful. We learned what to do, what not to do, where to go when stuck, and even when to just move on if able. Noting the differences in installing certain packages on Ubuntu versus others in Fedora was the most interesting portion in my opinion. It took a little research, but nothing was more than we could handle. So, I thank you for joining this voice recognition adventure with me. Until next time, cheers!

0 reactions

Previously published at https://python.plainenglish.io/speechrecognition-in-python-df4e56fecf51

0 reactions

by Mike Wolfe @miketechgame. Software Developer, Tech Enthusiast, Runner.Read my stories

Join Hacker Noon

Create your free account to unlock your custom reading experience.

How to Do Speech Recognition in Python

How to Do Speech Recognition in Python

@miketechgameMike Wolfe

Why the hard way?

Installing SpeechRecognition

Coding the Speech Recognizer

Working on An Ubuntu Server

Working over WSL

Working on Fedora

Conclusion

Recommend

服务器的安全最佳实践

5 Toxic Behaviors That Have Been Normalized by Society

nodejs 子进程的正确用法（你应该忽视函数名）

A Snap Is Worth A Thousand Words (Podcast Transcript)

人工智能和机器学习将如何影响企业业务

Manage MySQL Users with Kubernetes

Why Phone Numbers Stink As Identity Proof

B端产品与C端产品的建设流程有何不同？

10 Cringe PR Outreach #FAILS Shared by Journalists on Twitter

五号香水百年庆系列活动：Chanel 周五将发布视频短片，探讨“什么是名人”？

About Joyk