Emotion based customization of Voice Assistants

Megha Sharma (~megha480)


Penetration of voice-activated speakers powered by intelligent assistants, such as Amazon Alexa, Google Home and Apple’s Siri is increasing with every passing day. It has been estimated that eight billion digital voice assistants will be in use by 2023. This high level of penetration gives us the opportunity of impacting the lives of millions by contributing in this field. This realization dawned on us sometime back and it led to the start of "Mozart", a project aimed at making the voice assistants personalized by empowering them with the ability to understand a person’s emotions.

Imagine you had a hard day at work, you reach home and want to enjoy some music. You ask your voice assistant to play some music and it starts playing the most popular songs. But these may not necessarily be suited for your mood. The exhausting day at work has put you in a low mood. You want some music which lifts your mood up. How great would it be if the voice assistant is smart enough to tap onto your emotion and play music accordingly? Our project is a simple materialization of a great concept. It can be expanded to something big like helping out people afflicted with emotional problems.

Our purpose of giving this talk is to encourage new ideas in this direction and share the lessons that we have learnt while working on this project. In our talk we'll be touching upon the below given topics and explaining them with the help of our project.

  • Importance of demystifying sentiments/emotions: In this section, we’ll share what motivated us to pick up this problem and what will be the magnitude of impact of solving it.
  • Deep Learning, Python and their magical powers: We’ll touch upon why and how Deep Learning is a saviour in scenarios like this. We’ll highlight the abstraction Python libraries offer and how it significantly reduces the development effort.
  • Basis of design choices: It’s always hard to decide on the right model and the libraries for its implementation. In this section, we’ll compare various deep learning algorithms (majorly CNN, RNN) and implementation libraries using different design aspects.
  • Key Learnings: It’s the experience sharing section! We’ll talk about the challenges we faced, mistakes we made and what all we learnt in our journey. Major focus will be on sharing tips related to tuning of models using techniques like K-fold cross validation and Grid Search in Python.
  • Demo: We’ll demo the Alexa skill which we built. It recognizes your mood and plays music according to that.
  • Future Scope: Intent of this section will be to instill new and unsettling ideas in the enthusiastic minds. We’ll try to highlight the potential this field carries by discussing its future applications.


It is a beginner level talk. The session is targeted for individuals who have a basic understanding of neural networks and Python. The purpose of the presentation is to bring the idea of personalizing voice assistants to the table, discuss approaches to solve it in Python and gather audience’s feedback on it.

Content URLs:

Slide deck (Work in progress)

Speaker Info:

Megha is a software developer at Amazon, Bengaluru and an open source enthusiast. She worked with Wikimedia as intern during Outreachy (2018) and Google Summer of Code (2018). She has recently developed an interest in the field of Deep Learning and is utilizing her project as a medium to delve deeper into it. She has been regularly attending Open Source conferences and last year even gave a talk in Pycon India.

Shailesh is a senior software developer at Amazon, Bengaluru working there since the last 7 years. He has an active interest in machine learning and Android development.

Id: 1303
Section: Data Science, Machine Learning and AI
Type: Talks
Target Audience: Beginner
Last Updated: