Project in Dsp using python :
Recognition of words
“Hey Google. What’s the weather like today?”
This will sound familiar to anyone who has owned a smartphone in the last
decade. I can’t remember the last time I took the time to type out the entire
query on Google Search. I simply ask the question – and Google lays out the
entire weather pattern for me.
It saves me a ton of time and I can quickly glance at my screen and get back
to work. A win-win for everyone! But how does Google understand what I’m
saying? And how does Google’s system convert my query into text on my
phone’s screen?
This is where the beauty of speech-to-text models comes in. Google uses a
mix of deep learning and Natural Language Processing (NLP) techniques to
parse through our query, retrieve the answer and present it in the form of both
audio and text.
The same speech-to-text concept is used in all the other popular speech
recognition technologies out there, such as Amazon’s Alexa, Apple’s Siri, and
so on. The semantics might vary from company to company, but the overall
idea remains the same.
I have personally researched quite a bit on this topic as I wanted to
understand how I could build my own speech-to-text model using my Python
and deep learning skills. It’s a fascinating concept and one I wanted to share
with all of you.
So in this article, I will walk you through the basics of speech recognition
systems (AKA an introduction to signal processing). We will then use this as
the core when we implement our own speech-to-text model from scratch in
Python.