Optimizing Machine Learning models for Deployment



With advances in access to compute and data, there has been a revolution in numerous fields such as computer vision, natural language processing, speech recognition etc. but the boon which enabled the current ML revolution, is also proving to be its bane. Once trained, there is the burning question of how to deploy a system in production. This requires highly optimized algorithms - such as size, computation power, power consumption and memory. Modern Deep Learning algorithms tend to demand high numbers in all the aforementioned factors. In order to tackle this problem - several optimization paradigms have been suggested mainly divided into during and Post Training Optimization.

In our talk, we will be presenting on how one could work on optimizing deep learning models for both these methods by talking about the TensorFlow Model Optimization Toolkit, the techniques that will be covered include Post Training Quantization and Weight Pruning. Both these models aim to optimize models to reduce size, latency and power for negligible loss in accuracy.

Post Training Quantization - Post-training quantization includes general techniques to reduce model size while also improving CPU and hardware accelerator latency with little degradation in model accuracy.

Weight Pruning - Weight pruning means eliminating unnecessary values in the weights of the neural networks. While applying this technique, we are intelligently setting the neural network parameters’ values to zero to remove what we estimate are unnecessary connections.

MorphNet - A recent development in the design of efficient neural network architectures, the philosophy is to essentially take existing models, and in one-shot, optimize it for the task at hand, for eg., Object Detection, Segmentation etc., this is done via a combination of the earlier mentioned methods.


  • 3 min : Deep Learning Boom and Challenges in Deployment
  • 2 min : Need for Model Optimization
  • 1 min : Introduction to Methods of optimization
  • 5 min : Post Training Quantization
  • 4 min : Weight Pruning
  • 3 min : MorphNet
  • 7 min : TensorFlow Model Optimization Toolkit + Demo
  • 5 min : Q&A


  • Intermediate proficiency in Python
  • Intermediate knowledge of Machine Learning and Neural Networks
  • Introductory knowledge of TensorFlow

Speaker Info:

Niladri Shekhar Dutt

Undergraduate Researcher working in the field of Deep Learning and its applications in the field of Computer Vision and NLP. He spent his last semester at the University of California, Berkeley, where he worked at the CITRIS and the Banatao Institute, Berkeley. He has won several hackathons including San Francisco DeveloperWeek Hackathon 2019 (America’s largest challenge-driven hackathon). His current research focuses on self-driving cars and training machine learning models with limited data. He organized AI Saturdays Kattankulathur last year, where he taught Stanford's CS224n (Computer Vision) to more than 200 students. He loves the hackable nature of Python and will be speaking at PyCon Taiwan later this year.

Sree Harsha Nelaturu

Undergraduate Researcher working in the field of Deep Learning and its applications in Computer Vision, Cognitive Sciences, Signal Processing and Creativity. He has spent a semester at the Massachusetts Institute of Technology, working at the Responsive Environments group at the MIT Media Lab. His current work is focused on Cognitive Science inspired modelling and deep learning compilers for model optimization for medical and edge based deployment and working with GANs for Content creation.

Speaker Links:

Niladri Shekhar Dutt:

Sree Harsha Nelaturu:

Id: 1319
Section: Data Science, Machine Learning and AI
Type: Talks
Target Audience: Intermediate
Last Updated: