Building a speaker recognition system
by Achintya Prakash (speaking)
- Scientific Computing
- Session type
- Technical level
Obtain an understanding of speaker recognition and how it works
Set up a basic speaker authentication system
Briefly discuss further improvements that can be made improve accuracy
Ever wanted to program your door to open at the sound of your voice? Or control everything in your room securely by just talking to it? Or make personalized voice controlled applications to improve accessibility? Whatever your goal, the first steps to any of these would be making your machine smart enough to know who’s talking.
Speaker authentication refers to how a machine would be able to recognize a person based on their voice alone. While the field itself is deeply embedded in statistics and mathematics, implementing speaker authentication systems have become much easier and more accessible now by abstraction provided from existing libraries.
This talk will first familiarise you with the basics of why speaker authentication works, and what makes it tick. The roadmap for the talk will roughly follow:
We’ll briefly discuss feature extraction from sound, training Gaussian Mixture Models to recognise human speech and individual speakers, step through the workflow of extracting features from voice samples, removing silences from the audio, comparing with trained models and getting a normalized score, setting a decision threshold based on the EER diagrams.
We’ll then hack some open source libraries (SPro and Alize) using python scripts, and get our own basic version of an authentication system working. The accuracy of the recognition, being dependant on a variety of factors including data set and voice quality, may not be the greatest, which will lead to the ending discussion of the talk.
The talk will close with a brief discussion of what can be done further to improve the accuracy of authentication. Some possible topics include implementing SVM or ivectors, using JFA or NAP for channel compensation, using different types of normalization, using impostor models for score normalization etc.
The talk doesn’t aim to be highly mathematical or overtly academic. Rather, the end result would be to have obtained enough of an understanding of basic speech recognition to begin hacking away at it at once.
My name is Achintya Prakash. I was interning at Bell Labs, where I did my bachelor thesis on security attacks against speech authentication systems.
I've spent 4 fun years in BITS, Pilani-Goa, where I studied Computer Science. I've been coding in Python for a couple of years now, and gave a lecture series on Introductory Python to colleges across Goa, as a member of the Google Technical Users Goup.