How we are building Smart Reply for Chat

Nirant Kasliwal (~NirantK)



Verloop is India’s largest conversational automation platform. We work with some of India’s most largest category defining companies to serve their users.. These include names such as Decathlon, MPL, and Milkbasket. We continue to double our user-conversations every 6 months. Our systems scale to a peak increase of ~5x messages everyday. Verloop now has the capability to support conversations on WhatsApp and web interfaces.

This scaling is often accompanied a linear increase in number of agents and their frustration. We focus on eliminating the frustration. There are several ways we do this, including canned replies (on the "/" command) and new approach: Smart Replies.

Smart Replies suggests a response to each agent, on the basis of their personal and organisation's chat history.

While this has been done at Google-scale, with their data and engineering scale, can we do this with the resources of a small 2-person ML team? The answer seem to be yes.

Talk Objective

This is a Data Science in Production case study. I share not just the success story and what worked but include the bottlenecks e.g. small data, the mistakes we made e.g. wrong vectorization and evaluation metrics, ML engineering challenges e.g. keeping latencies low enough for this to be usable, things we learnt along the way e.g. need for diversity and privacy-first data pre-processing.

Basic outline of the talk

  1. Smart Reply for Chat: What are we trying to do? [2 minutes]
  2. How Google built this: A quick summary [4-5 minutes]
  3. How We are building this: The Embed-Encode-Cache architecture [Total: 8-10 minutes]
    • Modeling the problem: Cluster, Rank, Retrieve instead of Classify
    • Vectorization Options: Beyond word2vec
    • Opinion: What vectorization works for which use case?
    • Model Design: Balancing Latency, Cost & Performance
    • Choosing the Right Evaluation Metrics
  4. Things we learnt about along the way [Total: 8-10 minutes]
    • Reponse Variety Bias Factor [2 minutes]
    • User Privacy challenges [2-3 minutes]
    • Limitations of Natural Language Generation [4 minutes]
  5. Questions from Audience [3-5 minutes]

In this talk, you will learn about how we are building and deploying a state of the art deep learning NLP solution. Our methods work for small data, latency challenges, and multi-tenant horizontal scaling. We build this completely with Python based open-source software.


A basic familiarity with the following would be useful for this talk:

  • Word embeddings algorithms such word2vec
  • Sequence to Sequence modeling ideas such as LSTMs, Transformers
  • PyData and Pre-trained Models in NLP

Content URLs:

Slides: (Google Slides Talk)

Speaker Info:

Nirant has worked across startups and MNCs in Machine Learning and Data Science roles. These include:

  • Soroco - Computer Vision: Image Segmentation - building Search for Enterprise Documents
  • Samsung Research at the Advanced Technologies Lab - Senor Fusion & Event Classification
  • (NLP/Predictive Analytics)

At his present role in, he focuses on Conversational AI

He has written a book on Practical NLP for Developers (Published by Packt). This book is a Quickstart Guide for Developers interested in building NLP based solutions, without the patience for pedantic learning on Linguistics and Deep Learning.

Recognition & Contributions

  • Won the Kaggle NLP Kernel Prize from Kaggle and Explosion.AI (makers of
  • Lead Maintainer for awesome-nlp with ~8.5K stars
  • GitHub's official Machine Learning collection includes awesome-nlp as world's best NLP resource
  • FastAI International Fellowship: 2018 & 2019

Recent Talks:

  • inMobi Tech Talks: A Nightmare on the LM Street; Slides
  • Wingify DevFest: NLP for Indian Languages; Slides, Video
  • PyData Bengaluru Inaugral Talk: Video, Resources

Speaker Links:

  • Personal Website:
  • Twitter:
  • Github:
  • LinkedIn:
  • Book:

Id: 1333
Section: Data Science, Machine Learning and AI
Type: Talks
Target Audience: Intermediate
Last Updated: