Asynchronous Programming for Scalable Machine Learning Pipelines in Python

Animesh Dutta (~animesh1)


2

Votes

Description:

This session will provide an in-depth exploration of integrating asynchronous programming techniques with machine learning workflows to create scalable, high-performance ML pipelines. In this session, I will cover advanced concepts and practical applications using Python's asyncio library to optimize data ingestion, preprocessing, and model inference. A detailed use case will demonstrate building an end-to-end asynchronous machine learning pipeline for a real-world use case. Objectives

  • Understand the role of asynchronous programming in optimizing complex machine learning workflows.
  • Explore advanced features of the asyncio library and their application in ML tasks.
  • Implement asynchronous data ingestion, preprocessing, and model inference in an ML pipeline.
  • Showcase a detailed, practical use case of an end-to-end asynchronous ML pipeline.
  • Illustrate the performance improvements and scalability benefits of asynchronous programming in machine learning.

Session Outline (30 minutes)

Introduction to Asynchronous Programming in ML (5 minutes)

  • Importance of scalable pipelines and overview of asynchronous programming benefits in ML

Deep Dive into asyncio for ML Pipelines (5 minutes)

  • Advanced asyncio components: Event loop, tasks, and coroutines. Concurrent data processing using asyncio.

Asynchronous Data Ingestion and Preprocessing (4 minutes)

  • Implementing asynchronous data ingestion from various sources, parallel preprocessing of large data batches.

Asynchronous Model Inference and Deployment (4 minutes)

  • Asynchronous model inference for high-throughput prediction, Integrating asynchronous inference with FastAPI for real-time services.

Case Study: Real-time Fraud Detection System (7 minutes)

  • Problem statement: Building a real-time fraud detection system for financial transactions. Synchronous approach: Challenges and limitations. Asynchronous approach: Implementation and performance gains. Code walkthrough, Performance comparison and analysis

Q&A and Discussion (5 minutes)

Prerequisites:

  • Intermediate to advanced Python programming skills.
    • Basic understanding of machine learning concepts.
    • Basic familiarity with asynchronous programming and the asyncio library is recommended.
    • Experience with ML workflows and data processing techniques.

Speaker Info:

I am working as a Senior Software engineer at a MNC. I am responsible for validation of Machine Learning models on-device. My work involves development and feature additions for a python framework which includes SDK inference, benchmarking and stress tests of ML models. I also have experience working with compression algorithms and enjoy exploring the visualization and analysis of memory usage patterns.

Prior to this, I have worked with a few startups focussing on Machine Learning and Computer Vision. With a keen eye for innovation and problem-solving, I love sharing my knowledge and insights with others. Let's explore the exciting realm of technology together!

Speaker Links:

Github LinkedIn

Blog link: https://towardsdatascience.com/system-failure-prediction-using-log-analysis-8eab84d56d1

Previous talks: PyCon lightning talk 2023: https://www.linkedin.com/posts/animesh145_pyconindia-datascience-pythoncommunity-activity-7114569968884621313-axkl?utm_source=share&utm_medium=member_desktop

Computer Vision DevCon 2020: https://analyticsindiamag.com/leveraging-computer-vision-in-drone-tech/

Section: Artificial Intelligence and Machine Learning
Type: Talk
Target Audience: Intermediate
Last Updated: