From "Hello World" to production, running Machine Learning models into production using Tensorflow serving and Kubernetes.
Do you have issues running your Machine Learning models in to production or want to learn what are some of the best industry practices in regards to deployment and serving predictions for Machine Learning models. In a local environment the task is simple but things become complex when a model is run in production. Scaling, Robustness and reliability are few metrics that one has to take care of while doing so.
This talk will look into answering that question by taking the audience through an example using best made industry practices.
Tensorflow serving "is a flexible, high-performance serving system for machine learning models, designed for production environments" . It allows one to deploy newer versions of a model to production without changing either the architecture or the API. Configure automatic deployment of a newer model as soon as an update model is available and run it alongside the older one for a no frills transition or scrap the older model completely and run the improved one in the production. With Tensorflow Serving one can easily deploy, manage their models in production to scale and even automate their deployment pipeline.
Kubernetes is used to deploy the created Serving, which will enable better scaling and robustness. How properties of a Kubernetes cluster like elasticity and resource management can be used to serve models to a large number of users and run computationally and data intensive models in production.
Audience will be taken through each step of the process with in depth insights on the tools used while answering the questions of What, Why and How along the process.
Outline/structure of the Session
- Build an example Machine Learning model.
- Tensorflow Serving: learning to serve predictions at scale and manage various models in production.
- Containerising the app.
- Introduction to Kubernetes and how created models can be deployed through it for robustness and scalability.
Learning about Tensorflow Serving and how it's architecture can help in management and serving of models at scale. (Used by Google themself in production)
Learn how to deploy trained models from Tensorflow Serving to Kubernetes for scalability and robustness.
The audience will be taken through an easy to follow example during the talk and by the end of the talk will know about the best practices of deployment in field of Machine learning and how to do it.