How to implement a YOLO object detector from scratch using PyTorch and OpenCV

Ayoosh Kathuria (~ayoosh)




The workshop will walk the audience on how to implement a state of the art object detector (YOLO: You only look once) from scratch using the PyTorch deep learning framework.

The main The aim is not to merely show the audience how to implement the detector that can work on videos, but give them a deep insight about the problems that rear their heads only when one is implementing a deep architecture. Some of these issues include,

  1. Rapid Prototyping with PyTorch: Which PyTorch classes and abstractions to use to quickly code up neural network. How to implement a layer if it doesn't already ship with PyTorch. Our detector has 3 such layers!

  2. How to deal with complex architectures efficiently: What if your network has more than a 100 layers? Our detector certainly has 106! Do we write 106 lines of code for each layer? What if we want to run our detector over a folder containing 100000 images that we can't fit into our RAM at once. Best PyTorch practices to get around problems like these will be discussed.

  3. Speeding up Python code with Vectorisation: Python can be a slow language, but PyTorch does provide a lot of functions that are merely wrappers for super fast C code under the hood. Vectorisation and broadcasting will be covered in great detail. Using vectorised code instead of loops to do iterative tasks can give speed ups as much as 100x. Our detector can not work in real time without these optimisations.

  4. Managing GPU resources: How to write device-agnostic code, parallelize GPU/CPU ops, practices to reduce redundant GPU memory usage, and how to time GPU code.

We will review the entire code base, and spend much time on justifying design decisions. A lot of non-critical code will be provided as it is to the audience, while they are expected to code along when it comes to the critical parts. These parts would be discussed in greater detail. Important PyTorch features might also be demonstrated using toy examples outside the detector code base, which the audience is also expected to code along. A docker image as well as Jupyter notebook will be provided to the audience. Google Colab may also be considered with notebooks provided.

Most of the tutorials online demonstrate how to write code that is more proof-of-concept rather than being performant. When it comes to learning to code complex architectures, especially when we are transitioning from beginner to intermediate stage, most of us have to rely on the laborious process of reading open source code. The idea of this workshop is to help audience move along this journey.


  1. Knowledge of Python
  2. Basic understanding of convolutional neural networks, image classification and preferably, but not necessarily object detection (Will spend 15 min or so giving an overview of YOLO algorithm)
  3. Basic understanding of PyTorch (the level that can be reached by taking the official 60 min tutorial)

Content URLs:

Tutorial Series

Github Repo (Most starred repo for a Python implementation of YOLO v3, at 589 stars at the time of speaking)

Speaker Info:

I'm currently an research intern at a DRDO Lab where I work on video semantics, detecting violence as well as unusual activity in surveillance footage. My other interests include weakl supervised, unsupervised learning and generative modelling using GANS. I've recently graduated college, and while at college, I founded AI Circle, SMVDU, a club dedicated to helping students get started with machine learning through lectures and hands-on sessions, many of which were conducted by me. I am very passionate about sharing what I've learned, and write articles regularly at Paperspace and Medium.

Speaker Links:

Paperspace blog:

Medium :

Github :

Section: Data science
Type: Workshops
Target Audience: Intermediate
Last Updated:

Not the right place, but can you share your contact details so that I can clarify my doubts related to YOLOv3?

Mohit Rathore (~markroxor)

Login to add a new comment.