Decoupling machine learning inference: The Kafka Way

Devesh Bajaj (~devesh50)


With machine learning increasingly becoming an engineering problem - scalable APIs, GPUs, TPUs, inference @ edge - most of the classical infrastructure approaches tend to fail when pushed to the edge. At Episource, we faced a similar challenge - with millions of inference requests - we increasingly saw the need for a self-healing, fault-tolerant, and distributed framework for solving the bottlenecks of demanding ML pipelines.

Episource’s machine learning & NLP platform is responsible for processing millions of pages of medical records, with close to 15 ML/DL models working in tandem to generate the final outcomes. With such a demanding pipeline, infrastructure errors would often fall through the cracks. Additionally, strongly inter-dependent models would create a chain effect of errors affecting SLAs for the processing pipelines.

With many iterations and experiments, we arrived at a model inference framework that leverages Kafka and gRPC on the cloud. Not only did the model serving times decrease, but we were able to remove a lot of the opaqueness around model monitoring, error logging, and debugging critical errors. Additionally, this gave us the ability to decouple our architectures while maintaining the need for scale.

During this talk, a participant can expect to understand the following;

  1. Building stateless ML serving frameworks
  2. DAGs as Kafka Consumers
  3. Managing Kafka topics derived streams for multiple handshakes
  4. gRPC vs REST: pros and cons for ML serving

This talk will be an in-depth exploration into the thought process, experiments, and lessons learned which allowed us to perform stream ML inference at scale


  1. A quick primer on problems related to standard ML inference solution(~ 5 mins)
  2. Introduction about Episource and speaker (~3 mins)
  3. Introduction to the Kafka Architecture (~5 mins)
  4. Introduction to the gRpc Architecture (~5 mins)
  5. Fight between gRpc vs rest, for ML inference (~ 5 mins)
  6. Key points for managing stateless architecture.(~ 2 mins)
  7. Q n A (~ 5 mins)

Slides for the talk

(Note:- Slides are subject to vary a little at the time of conference )


  1. Basic Knowledge of Kafka ecosystem
  2. Basic Knowledge of how a Rest API works

Speaker Info:

Devesh is a Data Engineer with Episource LLC, with close to 2 years of experience.

  • With primarily involved in building microservices that perform Machine Learning Inference along with the development and deployment of end-to-end full-stack applications.

  • With the keen passion for the development of IoT based application and Won 2nd​ ​ prize​ ​ in​ ​ ​ CanSat​(2017) ​ Satellite​ design ​ competition​ ​ organized​ ​ By​ ​ American Astronautical​ ​ Society​ ​ (AAS)​ ​ and​ ​ Recognised​ ​ by​ ​ NASA.

Speaker Links:

You can find Devesh at

LinkedIn :-

Twitter :-

Medium : -

Github :-

Section: Decentralised and Distributed Technology
Type: Talks
Target Audience: Intermediate
Last Updated: