Compression of Neural Networks

Vishal Gupta (~vishal11)




Neural networks are both computationally intensive and memory intensive, making them difficult to deploy on embedded systems with limited hardware resources.

Compression of Neural Networks (NN) has become a highly studied topic in recent years. The main reason for this is the demand for industrial scale usage of NNs such as deploying them on mobile devices, storing them efficiently, transmitting them via band-limited channels and most importantly doing inference at scale.

A number of papers have been published in last few years, proposing different approaches to minimize the footprints of neural networks. The aim of my talk will be to summarize recent developments and techniques in this field, by quoting benchmarks, algorithms and results from papers. On a superficial level, there are two basic types of compression are Network Pruning and Quantization.

Network Pruning[HTML_REMOVED] The motive behind network pruning is to selectively nullify or remove some nodes in order to reduce the size of the NN without losing much accuracy. Not only does this reduce the space required to store the model but also reduces the number of computations for sample. A number of papers in the last 2 years have suggested using Bayesian inferences and Variational Dropout, a probabilistic approach to estimating deterministic weights and selectively pruning some of them after sparsifying respective weight matrices.

Quantization[HTML_REMOVED] Conventionally, weights are stored and operations are performed with 32bit floating point numbers but with the rising need for running models on constrained devices, neural networks can be further compressed by either reducing the number of unique weights by clustering or by reducing the number of bits required represent weights, which also adds a regularizing effect, often resulting in higher accuracy than raw models.


Knowledge of Bayes Theorem, Convolution Neural Networks and common Image Classification datasets.

Content URLs:

Will add slides later. Have added links to papers in my description.

Speaker Info:

Hello world. I’m Vishal Gupta, a final year CSE undergrad at SSN, Chennai, India. A Python programmer by heart and ML enthusiast by inspiration, I have worked on a number of different projects, some out of boredom and some for startups. [HTML_REMOVED] This summer I had to chance to work at Microsoft Research India (Bangalore), on using Bayesian Compression on Object Detection Networks (tiny-yolo) and deploying it on an FPGA board. I was working with a team from IIITD guided by Prof. Saket Anand. [HTML_REMOVED]I'm also participating in Google Summer of Code 2018 under Debian.

Past Experience :

  • Chatbot intern at GoBumpr, Chennai
  • CV intern at XR Labs, Chennai
  • NLP intern at BicycleAI, Banglore

Speaker Links:

Complete list of projects[HTML_REMOVED] LinkedIn - Vishal Gupta[HTML_REMOVED] GitHub - py-ranoid

Id: 1016
Section: Embedded python
Type: Talks
Target Audience: Intermediate
Last Updated: