Effectively Debugging Deep Neural Networks
Jaidev Deshpande (~jaidev) |
Deep learning is expensive. Not only is there a steep (human) learning curve, there is also an immense cost in designing and training a deep neural network. In typical R&D settings, it is very common for a deep network to take days to train. To make things even more tough, there are no guarantees of convergence. It is not uncommon to find that a network has learnt nothing even after hours or even days of training. And it is likely that we as practitioners, too, might not learn much from the experience. The worst possible way to deal with an untrainable network is to leave it alone (apart from a few minor tweaks like changing the learning rate or picking a different training subset of the data) and let it run for another few hours or days.
Python is known best to be a language that allows you to do rapid prototyping. But even this feature is at its least impressive when it comes to deep learning (understandably so, since Python is just the top layer in most deep learning frameworks). Nevertheless, there are many techniques one can employ to help improve feedback from the network, and even to fail fast, thereby saving precious time.
While no algorithm or technique can guarantee whether a network will learn anything to a specified degree, there are many practices we can use to be relatively more confident about the performance of the network, as against being totally in the dark. One should be able to say with some confidence, things of this sort:
"The loss should have dropped below X by now."
"It should have learnt to classify at least the second category from the rest."
"It should clearly not be taking so long to converge."
This is an advanced workshop intended to make users comfortable with debugging deep networks.
- Basics of neural networks: The audiences should know what the different hyperparameters of neural networks are, especially learning rate, gradient optimizers, regularization methods etc. We will be learning how to pick the correct combination of these for a specific problem.
- Entry-level experience with keras
- Basics of either one of tensorflow of theano
- A laptop with a at least a quad-core processor (you should see four cores when you open the Task Manager or htop) and and at least 4GB memory.
I am a data scientist based in New Delhi. I currently work as the Practice Lead, Data Science at Juxt Smartmandate Analytical Solutions Pvt Ltd. I am an active member of the Python community. I've spoken at various conferences about my FOSS work. My research interests are in signal processing and machine learning. In my spare time I like to dabble with applications of machine learning in personal productivity.