Speech Synthesis engine for generating human like natural voice

Rishikesh kumar (~rishikesh)




Can we make any machine talk or give speech, naturally like any human ? Can my digital personal assistant like Siri, Alexa etc mimic my voice or give response in my own voice ? Generating human like natural voice has been a topic of research for a long time and a quite challenging task. But recent development in field of Speech Synthesis using advance deep learning technologies has made it achievable.

Speech Synthesis has been integral part of any voice driven application. Although we have been able to generate good quality voice using standard method but in reality the generated voice is still too robotic ,emotionless and far away from the actual human voice. In the recent AI development in this field has made it possible to generate expressive human level voice. There are many recent papers like wavenet ,Tacatron and deep voice which do well upon precisely generating actual human voice and even mimic any person voice.

In this talk , I will cover literature of voice synthesis and how we can generate human level voice without doing phd in speech processing.

Key Components of talk : [HTML_REMOVED] 1. Understand the basic literature of speech synthesis [HTML_REMOVED] 2. Components of speech synthesis engine. [HTML_REMOVED] 3. How to create own voice dataset. [HTML_REMOVED] 4. Building basic text to speech engine using Tacotron2. [HTML_REMOVED] 5. Application of real time speech synthesis. [HTML_REMOVED]


  1. Basic knowledge of python and jupyter notebook. [HTML_REMOVED]
  2. Familiarity with machine learning components. [HTML_REMOVED]
  3. Basic knowledge of linear algebra, probability distribution and calculus. [HTML_REMOVED]
  4. Knowledge of speech processing is bonus . [HTML_REMOVED]

Speaker Info:

Myself Rishikesh ! I am working at Humonics Global Pvt. Ltd as Data Scientist. Apart from my job I am actively contributing to open source projects and speaker of many data science communities like PyData Delhi, Delhi Kaggle Group etc. My area of expertise are Speech processing, Data science, Deep learning and statistical modeling .

Speaker Links:

Linkedin [HTML_REMOVED] Github

Id: 942
Section: Data science
Type: Talks
Target Audience: Intermediate
Last Updated: