Processing videos effectively - Piping the Parallelism out of Python

R S Nikhil Krishna (~rsnk)


6

Votes

Description:

Almost all of us have used VLC, simply because it's so good at what it does. Reads multiple file formats, transcodes videos, makes basic filtering (brightness correction, etc) effortless, etc

But have you ever wondered what makes VLC so efficient? VLC uses libavcodec in the backend, which is just a way for it to access the API of FFmpeg, a key software in video-processing. However, one is seldom satisfied with the basic filters that libraries like libavcodec offer, which leaves us to write our own code using libraries like OpenCV and Scikit-Image.

These codes are generally inefficient though because they do not make the best use of a user's computer capabilities, as well as in terms of encoding with minimal loss.

In this talk, we will take a look at what it takes to process videos in python as efficiently as libavcodec/FFmpeg in terms of both speed and quality

  • Speeding up video processing (simple ex: watermarking) by >2x in Python by parallelizing across frames a) Using multiple CPU cores b) Using GPU cores
  • Scaling up to 4k and 8k videos - Better quality and compression by piping the data from Python to ffmpeg
  • Minimizing I/O access by storing transport streams in memory from Python

The talk will cover it in the following manner:

  • Quick basics of computer vision - How images and videos are stored and managed, and how to handle videos in python using FFmpeg and OpenCV, an open source computer vision library (5 min)
  • Using piping and multiprocessing to smartly speed up and improve your video processing (15 min)
  • Simple parallelism in Python making use of the multiprocessing module, and how it extends to CV (Computer Vision) (5 min)
  • Space vs Time difference in parallel processing videos
  • Parallelizing the video processing pipeline on the GPU using numba and cudatoolkit

Prerequisites:

Recommended:

Preferable:

  • Understanding of the Python GIL and multiprocessing
  • Basics of what FFmpeg is and its usage, what makes it efficient
  • Basic understanding of Cuda kernels

Content URLs:

The presentation, which is still being curated, can be viewed here

The code to generate the results can be viewed here

Audience might find these links useful, however they are not a prerequisite

Speaker Info:

In the The Computer Vision and Intelligence Group at IIT Madras, both authors have taken sessions to initiate hundreds of college students to Python, Computer Vision and AI, openly accessible at their github repo

R S Nikhil Krishna

Nikhil is a final year student at IIT Madras. He currently leads the Computer Vision and AI team at Detect Technologies and has headed the CVI group at CFI, IIT Madras in the past. In the past, He has worked on semi-autonomous tumour detection for automated brain surgery at the Division of Remote Handling and Robotics, BARC and on importance sampling for accelerated gradient optimization methods applied to Deep Learning at EPFL, Switzerland. His love for python started about 4 years back, with a multitude of computer vision projects like QR code recognition, facial expression identification, etc.

Lokesh Kumar T

Lokesh is a 3rd-year student at IIT Madras. He currently co-heads the CVI group, CFI. He uses Python for Computer Vision, Deep Learning, and Language Analysis. In DeTect technologies, he has worked on automating the chimney and stack inspections using Computer Vision and on on-Board vision-based processing for drones. His interest in python began during his stay at IIT Madras, from institute courses to CVI projects like face recognition, hand gesture control of bots, etc

Speaker Links:

R S Nikhil Krishna

Lokesh Kumar T

Section: Core python and Standard library
Type: Talks
Target Audience: Advanced
Last Updated: