Voice Cloning using Deep Learning

Kurian Benoy (~kurianbenoy)


Description:

Calling your mother on your voice and talking on anything I want. How frightening can it be? Using new advances in Deep Learning, we can do that, from transfer Learning from Speaker verification to Multispeaker Text-to speech Synthesis(SV2TTS) with a vocoder that works in real-time.

SV2TTS is a three-stage deep learning framework that allows to create a numerical representation of a voice from a few seconds of audio, and to use it to condition a text-to-speech model trained to generalize to new voices. I will be talking about similar architectures for Voice cloning and how it works. If I can implement this research paper by that time, I will do a live demo of generating voice from given text during Poster presentation time.

Prerequisites:

absolutely nothing :P

Content URLs:

  • Planned contents of Pycon poster
  • Summary of the original research paper written by me which accounts for most of the content of poster can be found here
  • https://arxiv.org/pdf/1806.04558.pdf
  • https://github.com/CorentinJ/Real-Time-Voice-Cloning

Speaker Info:

Kurian Benoy is an open source contributor at CloudCV, DVC. He is the lead organiser of School of AI, Kochi and is an AI enthusiast working on Deep Learning and Computer Vision. Kurian is FOSSASIA Open TechNights WInner and gave a talk in FOSSASIA Open Tech submit about the [keralarescue.in team] (https://www.youtube.com/watch?v=2RzImb5JwMA).

Speaker Links:

Section: Data Science, Machine Learning and AI
Type: Poster
Target Audience: Intermediate
Last Updated: