Efficient and Optimal Deep Learning Inference for Computer Vision Applications
Venkatesh W (~venkatesh53) |
Background and Motivation:
Journey of a cognitive solution is meaningful when it's put to use or can actually solve business problems in real time through the inference. Deep Learning model Inference is as important as model training and especially when it comes to deploying cognitive solutions on the
edge, inference becomes lot more critical as it also controls the performance and accuracy of the implemented solution. For a given computer vision application, once the deep learning model is trained next step would be to ensure it is deployment/production ready, which requires application and model to be efficient and reliable.
It's very essential to maintain healthy balance between model
inference time. Inference time decides the running cost for
on the cloud solutions and cost optimal
on the edge solutions come with processing speed and memory constraints, so it's important to have memory optimal and real time (lower processing time) deep learning models. Best possible model accuracy is what anyone would want.
This talk focuses upon various techniques to run accelerated and memory optimal deep learning model inference without compromising too much on model performance/accuracy. This session also talks about various deep learning inference frameworks/libraries that are mostly written in
C++ and that can be leveraged on
Python in achieving our goal, how and where to use them. Then we are going to talk about what are techniques and tools that enable us to optimise model inference on the
mobile devices. At the end session talks about techniques to improve the
accuracy/performance of computer vision application as a whole.
Outline of the Talk:
- Background and Need for optimisation [3 min]
- Inference Acceleration and Optimisation Techniques [8-10 min]
- Role of C++ Inference Accelerator Frameworks and Libraries and their Python APIs [ 5 min]
- Optimal Inference on the edge and mobile devices - tools & techniques for Python and other languages [5 min]
- Techniques to improve the accuracy of sample computer vision applications [ 5 min]
- Q and A [3 min]
Understanding the need for memory and processing optimisations for deep learning models in computer vision applications
C++in enabling accelerated and optimal model inference in
Various tools and techniques needed to reduce inference cost of deep learning model based computer vision applications
Efficient and optimal model inference on edge and mobile devices
Memory requirements Vs Processing time Vs AccuracyTrade-offs during the model inference
Who the talk is for?:
Someone who wants to know how the model inference optimisations are done
You have worked enough on training the model and deploying them on cloud. Now if they're to be deployed on edge/mobile devices, what are the techniques to be used to reduce the cost or to make the solution work in real time.
You're someone who has extensively used Deep Learning frameworks like PyTorch, TensorFlow, Keras etc. for the training and inference. Now you want to know/learn which, how and where to use Deep Learning inference frameworks.
You have basic knowledge of CNN (Convolutional Neural Networks) training and inference, want to learn some intermediate level stuff about model inference.
- Basic knowledge in Convolutional Neural Networks
- Basic knowledge in computer vision based deep learning
- Basic knowledge in Python
Influenced by following research work:
- Low-bit Quantization of Neural Networks for Efficient Inference
- A Survey on Methods and Theories of Quantized Neural Networks
- Pruning Convolutional Neural Networks for Resource Efficient Inference
Venkatesh is an AI practitioner and AI content creator with 8+ years of experience. As a solution consultant at Sahaj Software Solutions, he helps businesses solve complex problems using AI-powered solutions. He specialises in Deep Learning, Computer Vision, Machine Learning, embedded-AI and business intelligence.
He has worked on variety of solutions ranging from building a DL Inference Engine for an e-AI chip, implementing Computer Vision based brand monitoring pipeline to research and development in the area of ATM surveillance through deep learning.