Demystifying speech recognition with Project DeepSpeech

Vigneshwer Dhinakaran (~dvigneshwer)


1

Vote

Description:

Pitch:

Our voices are no longer a mystery to speech recognition (SR) software, the technology powering these services has amazed the humanity with its ability to understand us. This talk aims to cover the intrinsic details of advanced state of art SR algorithms with live demos of Project DeepSpeech.

Description:

A research says that "50% of all searches will be voice searches by 2020". World’s technology giants have placed big bets with their investments in services providing voice search, personal digital assistant, IoT devices etc. Solving the problem of speech recognition is a herculean task, given the complexity involved with data like the human voice.

The talk will cover a brief history of speech recognition algorithms, the challenges associated with building these systems and then explain how one can build advanced speech recognition system using the power of deep learning and for illustration, we will deep dive into Project DeepSpeech. Project DeepSpeech is an open source Speech-To-Text engine developed by Mozilla Research based on Baidu's Deep Speech research paper and implemented using Google's TensorFlow library.

Speech recognition is not all about the technology, there's a lot more concerns, challenges around how these AI models are being part of our day to day life , it's biases etc. The bigger question revolves around centralization of these AI services, projects like Common Voice addresses these problems by enabling all to be part of this revolution, a part of the talk will focus on how people need to approach these type of research keeping in mind the community and humanitarian benefits as first priority.

Prerequisites:

  • Basic Python
  • Feel enthusiastic about ML & AI services
  • Interest to learn about speech recognition systems

Content URLs:

Session Content:

  • Introduction to main units of Deep learning
  • Feature engineering techniques for audio data
  • DeepSpeech Architecture
  • Live demo of DeepSpeech Project
  • Common Voice initiative (why and its need)
  • Community Support details
  • Applications of speech recognition

Key Takeaways:

  • Unravel the mystery behind the AI which powers speech recognition for services such as Siri, Google Assistance etc
  • Learn about various by which one can contribute to Project DeepSpeech & Common voice project
  • Get introduced to major units of deep learning and state of art DL architectures powering speech to text applications

Tags:

AI, speech recognition, speech to text, machine learning, Python, tensorflow, deep learning, Voice search

Projects links:

DeepSpeech:

  • https://github.com/mozilla/DeepSpeech
  • https://arxiv.org/abs/1412.5567

Common voice:

  • https://voice.mozilla.org/
  • https://voice.mozilla.org/en/data

Speaker Info:

Vigneshwer is an innovative machine learning researcher with an artistic perception of technology and business, having several years of experience in developing robust machine learning solutions for video and text analytical problem statements and have played key roles in analyzing problems, creating hypothesis matrix and delivering novel algorithms and data-driven solutions for many fortune 500 companies. An open Source aficionado, Official Mozilla TechSpeaker and the author of Rust cookbook.

Speaker Links:

Github | Website | Facebook | Twitter | LinkedIn | Talks

Section: Data science
Type: Talks
Target Audience: Beginner
Last Updated: