Deploying a Machine Learning Inference Server in Production

Chandim Sett (~chandim)





After the development of a software application, deploying the application in production is a whole new ball game. Various expectations like high availability, fault tolerance, scalability sets in and all of these must be addressed. In this talk we are going to take an ML inference server application( a real-time flask based server which serves machine learning predictions based on the incoming requests containing model features from the client ) and describe how we can achieve some of the deployment strategies using a popular Docker orchestration tool- Docker Swarm.

Who this is for

Individuals looking for strategies for deploying applications in an industrial setup. This talk can be helpful for data science, software engineering, and devOps professionals who want to understand deployment of applications (here in this case we are using a Machine Learning Inference server as an example) using Docker orchestration.

What is about

In this talk we are going to discuss about various strategies undertaken for deployment. First we are going to talk about the expectations that we have set for deployments like continuous integration and deployment, high availability, scalability, fault tolerance, and minimal maintenance. Then we are going to brush up our knowledge on Docker basics and get in-line with the Docker terminologies like containerization, images, container, registry and repository. Then we will get familiar with some of the deployment terminologies and visualize each of them schematically. We will then take an use-case: ML Inference Server; and then understand how we can achieve our deployment goals using Docker Swarm. Finally we will discuss various organisational roles and responsibilities for deployment management and also talk briefly on build and release pipelines. .


  • Expectations of Ideal Deployment [2 mins]
  • Docker Basics Terminologies [3 mins]
  • Schematic Representation of the Terminologies [4 mins]
  • Application Overview: ML Inference Server [2 mins]
  • Docker Compose file for ML Inference Server [3 mins]
  • ML Inference Server Deployment with Docker Swarm [4 mins]
  • Deployment Management Layers[3 mins]
  • Build and Release Pipeline for Docker Apps [3 mins]


Anybody with a basic knowledge and understanding of the following items:

  • Software Development
  • Python
  • Data science engineering
  • Docker Basics

Speaker Info:

Chandim Sett works as a machine learning software engineer for HighRadius, a pioneer in the area of SaaS-based products for the Financial Supply Chain Industry. He is a Computer Science graduate from Kalinga Institute of Industrial Technology, Bhubaneswar and has served as the key person in the deployment of machine learning pipelines for large scale enterprise applications. His areas of expertise include enablement of enterprise applications for Data Sciences, DevOps implementation, and Infrastructure automation. He is also a co-author of the book "Datascience for enterprises":

Section: Developer tools and automation
Type: Talks
Target Audience: Intermediate
Last Updated: