Designing a production-grade realtime Machine Learning Inference Endpoint

Chandim Sett (~chandim)





A real-time ML Inference endpoint is a web-server which takes trained model's feature data as input requests and produces the predictions by loading the desired serialized model files into the memory. This system enables to perform machine learning predictions via a client and server model. In this talk we will see how we design a python web-service with a popular web server framework- flask. We will be adhering to the software engineering principles while designing the system for an optimum production quality.

Who this is for

Anybody who wants to design and develop a python application in an industrial setup. This is also for data science and software engineering professionals who are looking forward to designing machine learning applications for serving predictions for their use-cases.

What is it about

First we are going to describe about the scope of the application which is to load and serve the serialized model files for predictions and explore its various functionalities like cloud-native file handling. Then we will walk-through the various components like unit testing, documentation techniques and dependency management. We will follow a standard project structure with logging enabled and a varying project configurations according to the SDLC/special environment it is running on. Next we will understand the workflow of our inference endpoint and discuss about the I/O formats and finally we will discuss how to package and run the application in Docker.


  • Scope of our application [2 mins]
  • Exploring the functionalities [4 mins]
  • Project structure and components [3 mins]
  • Discussing Common Project Essentials [3 mins]
  • Project and Credentials Configurations Format [2 mins]
  • Deep dive in the workflow [3 mins]
  • I/O Format of the Inference Endpoint [2 mins]
  • Packaging and Running the application [3 mins]


Anybody with a basic knowledge and understanding of the following items:

  • Software Development
  • Python
  • Data science engineering
  • Docker Basics

Speaker Info:

Chandim Sett works as a machine learning software engineer for HighRadius, a pioneer in the area of SaaS-based products for the Financial Supply Chain Industry. He is a Computer Science graduate from Kalinga Institute of Industrial Technology, Bhubaneswar and has served as the key person in the deployment of machine learning pipelines for large scale enterprise applications. His areas of expertise include enablement of enterprise applications for Data Sciences, DevOps implementation, and Infrastructure automation. He is also a co-author of the book "Datascience for enterprises":

Section: Data Science, Machine Learning and AI
Type: Talks
Target Audience: Intermediate
Last Updated: