Speech recognition using Python: how a computer can tell if you're angry
Computers can tell us whether we’re happy, sad, angry or any of the several emotions we feel. Computers can understand what we’re saying and answer back. How does all this magic happen? This concept of teaching a program to analyze speech and understand it is called speech recognition. I’ll talk about speech recognition and its various nuances, and how it is handled using Python. I’ll also talk about various branches of speech recognition such as speech emotion recognition and text generation based on speech data, and speech recognition implementations on hardware as well. Here is a basic summary of what all I will cover:
- Speech recognition: what is it, why is it required - concepts like spectral analysis, MFCCs (Mel Frequency Cepstral Coefficients), Fourier transforms, signal processing etc.
- How Python can make speech recognition easier
- Branches and new areas of speech recognition: speech emotion recognition, sentiment analysis etc., work done in these fields in the past few decades
- How speech recognition models are built: acoustic and language models etc.
- Resources like blogs, libraries, toolkits etc. for studying and getting started with speech recognition models in Python
- Basic workflow and tips on how to create your first speech recognition model using Python
- A brief on various repositories of speech databases, how they can be accessed and prepared for input to speech models
- Speech recognition models implemented on FPGA (hardware), some seminal (and thoroughly comprehensive) research papers to read on the latest work in the field
- Other media such as video data and face emotion recognition, resources for studying them up further
- Applications and future scope, closing remarks
I will cover the basics of how speech is read, processed and quantified, concepts like the Fourier Transform and spectral analysis, the various Python libraries and resources that exist for the same, and how one can build their own speech recognition system easily. Perhaps an Alexa 2.0?
Basic knowledge of Python and data science should suffice.
I am a third year undergrad at Delhi Technological University. I am passionately fond of data science and machine learning, and have worked on several projects and authored research papers on the same. My research area particularly centers around ensemble learning and methods, and I've started taking an interest in speech recognition systems in recent months. I have worked with professors across several universities, and am always up for discussing Python, machine learning and data science with anyone.