DevOps for Machine Learning: Deploying ML Models at Scale
Prabakaran Kumaresshan (~prabakaran16) |
I swear by the Dutch, this is not an ML Workshop*
It is my impression that the world of deep learning research is starting to plateau. What's booming: deploying DL to real-world problems.
I trod the same path when I started as a founding ML Engineer, over the past two years I have learned that solid engineering is essential for building ML Application at web scale. Productionizing ML model is the last mile journey, the most dreaded and less talked about topic, knowing the right toolchain to automate your build pipeline is essential for APIfiying your ML Models.
Typical ML pipeline is accompanied by a big data infrastructure to de-normalize and preprocess the application data to prepare training data, then a microservice to expose the trained model artifact on a runtime component as a service.
In this workshop, we will explore the DevOps toolchain to build, train, test, deploy and monitor an ML Model. The focus will be on the toolchain and how to automate the entire process from commit to deployment.
To illustrate the whole process we would build a toy recommendation application for an on-demand streaming service provide Pyflix.
Here is the reference Application Architecture for our Pyflix Recommendation Engine.
- Introduction to DevOps Culture
- Quick Introduction to ML/Big Data tools used in the Application - PySpark, Scikit-Learn (if required)
- Introduction to Containers and Cloud Infrastructure (Docker and AWS)
- Introduction to Infrastructure as Code (Terraform and Ansible)
- Building CI/CD pipeline with Jenkins
- Building Data Pipeline with Airflow
- Building RESTful Service with Django Rest Framework
- Application Architecture Introduction - Pyflix
- Putting All Together to Build Recommendation Engine
The workshop will spin around DevOps tools to build ML Pipeline. We will implement a rudimentary recommendation engine so a basic understanding of ML is enough. We will start with an introduction to DevOps and tools used, however good understanding of DevOps culture will help participants get the most out of the workshop.
The edx course on DevOps by Microsoft is a great resource, but not necessary for this workshop.
The Demo could be set up either in local with Docker or in the cloud.
- Basic understanding of Containers
- Basic understanding of Cloud Infrastructure (AWS)
- Basic understanding of ML/BigData(PySpark)
- A little bit of googling on Jenkins and Airflow will help
- For local demo
- A Linux PC with preferably 8GB Ram, Windows or Mac users needs to perform some additional steps to install Docker.
- Docker Compose
- For AWS
- awscli with configured credentials
Will Update Shortly
By profession, Prabakaran Kumaressha designs algorithms to score complex user interactions, classify use generated contents, derive insights and APIfying them to run at scale. He has been data wrangling for 5+ years, specialized in NLP, uses Jupyter to analyze data that fits his PC memory, PySpark for anything that doesn't, uses Django+DRF to create microservices embracing DevOps culture, mostly on AWS. Occasionally he gives talks at local meetups.