How to track your Machine Learning Experiments Effectively

Sanyam Bhutani (~init27)


Description:

The usual pipeline for working on a machine learning experiment is very different from Software Engineering. This talk will be highlights of Tracking the experiments and the iterative nature of the same effect inside of a Jupyter notebook, how to effectively apply these ideas to Kaggle competitions and make these work with data science teams.

GitHub and GitLab alternatives do not account for the different manner of using the pipelines. The talk will share the best practises and alternatives that could be used, these are based on my experience from Kaggle, Prototyping.

I'll also present the free options: Spreadsheets, Slack along with an open source library: Jovian that I'm a contributor to, to effectively track these. The best methods and use-cases for these.

The talk will also share the most common pitfalls for tracking experiments.

Here is the rough Outline for my Talk:

10 minutes: Introduction to the problem: Why is Tracking ML Experiments difficult? Definition, the workflow of the ML Experiment Iterative Nature of the Problem? Why GitHub is not good enough? ex: Tracking Datasets, 100 models, etc.

10 minutes: Ways to Track experiments:

Spreadsheets Heavily Annotated Jupyter notebooks Slack Jovian (Free + Open Source), tool for Jupyter 5 minutes:

Best Practises for Kaggle Best Practises for a collaborative Experiment Things to avoid 5 minutes: Q&A

Prerequisites:

  • Familiarity with Python
  • Familiarity with Jupyter notebook
  • Some Experience with Building Machine Learning models.

Content URLs:

Update: Drafts of Slides: These give an overview, gist of the talk. I'm happy to iterate further or add any changes if suggested.

https://docs.google.com/presentation/d/1E6tdAoUQPMZVmC4zt-2FyJ5-Lud4eayW864_seyqiGU/edit?usp=sharing

Speaker Info:

Sanyam is a Machine Learning and Computer Vision practitioner, recognized by media such as inc42 or Economic Times. Sanyam is a Kaggle Triple Expert (ranked top 1% in all categories), He is also an active blogger on Medium, which recognizes him as a "Top Writer in Artificial Intelligence".

Sanyam has done various research and industrial internships based on Deep Learning applications at the Indian Institute of Technology Madras, the Indian Institute of Technology Roorkee, ONGC, and Tech-Mahindra. He has a background in Computer Science and is an active contributor to multiple Machine Learning communities: Fastai, TWIMLAI, DS India, AISaturdays, and Kaggle-Noobs.

Speaker Links:

Twitter: https://twitter.com/bhutanisanyam1 Kaggle: https://www.kaggle.com/init27/ Blog: https://medium.com/@init_27 Linkedin: linkedin.com/in/sanyambhutani/

Section: Data Science, Machine Learning and AI
Type: Talks
Target Audience: Beginner
Last Updated: