Speech Recognition System is basically converting the spoken languages into text using available APIs. Deep neural network models are used for converting the audio into text. In this blog, I talk about how to convert speech into text using Python. This is done by "Speech Recognition" API and "PyAudio" Library.
Processing Steps for Speech Recognition:
The input signal (audio) is get through microphone.
Converting physical sound signal into electrical pulses.
Then Converting electrical pulses into digital signals using Analog to Digital converter.
After digitized, an API model can be used to transcribed the audio into text.
Python Libraries:
pip install SpeechRecognition
import speech_recognition as speech_recog
This library is for converting the speech to text the one and only class we need is Recognizer class from the Speech_Recognition module. Depending upon the API we can convert speech to text.
recognizer_google() # here I can use google speech API
Then we need to install PyAudio library. Which is used to receive audio input and output through the microphone and speaker. Basically, it helps to get our voice through the microphone.
pip install PyAudio
Code :
#import library
import speech_recognition as sr
# Initialize recognizer class (for recognizing the speech)
r = sr.Recognizer()
# Reading Microphone as source
# listening the speech and store in audio variable
with sr.Microphone() as source:
print("Start Talking")
audio = r.listen(source)
print("Thank You")
# recoginize_() method will throw a request error if the API is unreachable, hence using exception handling
try:
# using google speech recognition
print("Text : "+r.recognize_google(audio))
except:
print("Sorry, Voice didn't Recognized")
Recognition with different languages:
We need to add the required language options in recognize_google( ) API. For example , if we want to talk in Tamil, Indian language, we can use "ta-IN" in language option.
print("Text: "+r.recognize_google(audio, language="ta-IN"))
Conclusion :
Finally we complete our speech recognition project with google speech recognition API. This is basic for NLP (Natural Language Processing) projects especially in audio transcript data. In future, we can make it to use for controlling the machine and robots by interfacing with the controllers.
コメント