Efficient and Optimal Deep Learning Inference for Computer Vision Applications

Venkatesh W (~venkatesh53)




Background and Motivation:

Journey of a cognitive solution is meaningful when it's put to use or can actually solve business problems in real time through the inference. Deep Learning model Inference is as important as model training and especially when it comes to deploying cognitive solutions on the edge, inference becomes lot more critical as it also controls the performance and accuracy of the implemented solution. For a given computer vision application, once the deep learning model is trained next step would be to ensure it is deployment/production ready, which requires application and model to be efficient and reliable.

It's very essential to maintain healthy balance between model performance/accuracy and inference time. Inference time decides the running cost for on the cloud solutions and cost optimal on the edge solutions come with processing speed and memory constraints, so it's important to have memory optimal and real time (lower processing time) deep learning models. Best possible model accuracy is what anyone would want.


This talk focuses upon various techniques to run accelerated and memory optimal deep learning model inference without compromising too much on model performance/accuracy. This session also talks about various deep learning inference frameworks/libraries that are mostly written in C++ and that can be leveraged on Python in achieving our goal, how and where to use them. Then we are going to talk about what are techniques and tools that enable us to optimise model inference on the edge and mobile devices. At the end session talks about techniques to improve the accuracy/performance of computer vision application as a whole.

Outline of the Talk:

  1. Background and Need for optimisation [3 min]
  2. Inference Acceleration and Optimisation Techniques [8-10 min]
  3. Role of C++ Inference Accelerator Frameworks and Libraries and their Python APIs [ 5 min]
  4. Optimal Inference on the edge and mobile devices - tools & techniques for Python and other languages [5 min]
  5. Techniques to improve the accuracy of sample computer vision applications [ 5 min]
  6. Q and A [3 min]

Key Takeaways:

  1. Understanding the need for memory and processing optimisations for deep learning models in computer vision applications

  2. Role of C++ in enabling accelerated and optimal model inference in Python

  3. Various tools and techniques needed to reduce inference cost of deep learning model based computer vision applications

  4. Efficient and optimal model inference on edge and mobile devices

  5. Memory requirements Vs Processing time Vs Accuracy Trade-offs during the model inference

Who the talk is for?:

  1. Someone who wants to know how the model inference optimisations are done

  2. You have worked enough on training the model and deploying them on cloud. Now if they're to be deployed on edge/mobile devices, what are the techniques to be used to reduce the cost or to make the solution work in real time.

  3. You're someone who has extensively used Deep Learning frameworks like PyTorch, TensorFlow, Keras etc. for the training and inference. Now you want to know/learn which, how and where to use Deep Learning inference frameworks.

  4. You have basic knowledge of CNN (Convolutional Neural Networks) training and inference, want to learn some intermediate level stuff about model inference.


  • Basic knowledge in Convolutional Neural Networks
  • Basic knowledge in computer vision based deep learning
  • Basic knowledge in Python

Video URL:


Speaker Info:

Venkatesh is an AI practitioner and AI content creator with 8+ years of experience. As a solution consultant at Sahaj Software Solutions, he helps businesses solve complex problems using AI-powered solutions. He specialises in Deep Learning, Computer Vision, Machine Learning, embedded-AI and business intelligence.

He has worked on variety of solutions ranging from building a DL Inference Engine for an e-AI chip, implementing Computer Vision based brand monitoring pipeline to research and development in the area of ATM surveillance through deep learning.

Speaker Links:

Section: Data Science, Machine Learning and AI
Type: Talks
Target Audience: Intermediate
Last Updated: