How to make continuous integration work with machine learning
Elle O'Brien (~elleobrien) |
Continuous integration is a key practice from DevOps that has been demonstrated to speed up development cycles, but hasn't yet found widespread adoption in data science and machine learning projects. Yet continuous integration has many potential applications to machine learning, particularly in model retraining, evaluation, and data viz generation.
This talk will cover:
- Why continuous integration has historically been difficult in data science and machine learning
- How to create a basic continuous integration system with open source tools in the Git ecosystem (GitHub Actions, GitLab CI, and an open-source project called Continuous Machine Learning)
- How to use GitHub Actions and open software tools to automatically retrain and evaluate models in production-like environments, plus generate generate data viz and human-readable reports
Python code examples will be made publicly available for hands-on investigation of the concepts and methods explored in the talk.
You will get the most out of this talk if you know Git basics (committing and pushing), and are aware of some high-level concepts from data science and machine learning (for example, the standard workflow of training models on a dataset and evaluating on held-out data points). However, I will deliver this talk so that true beginners will still be able to appreciate the big ideas.
Dr. Elle O'Brien is a data scientist at Iterative, Inc. (the team behind DVC and one of the creators of Continuous Machine Learning (CML), an open source project for advancing DevOps practices in data science. She holds a PhD from the University of Washington and has presented about data science, DevOps, and scientific methods at more than 25 academic and industry meetings. Previously, she conducted research in computational neuroscience and speech perception, and worked as the Chief Scientist at Botnik Studios, an AI-comedy writing collective.
Some related work