Asynchronous Programming for Scalable Machine Learning Pipelines in Python

Animesh Dutta (~animesh1) | 29 May, 2024

2

Votes

Description:

This session will provide an in-depth exploration of integrating asynchronous programming techniques with machine learning workflows to create scalable, high-performance ML pipelines. In this session, I will cover advanced concepts and practical applications using Python's asyncio library to optimize data ingestion, preprocessing, and model inference. A detailed use case will demonstrate building an end-to-end asynchronous machine learning pipeline for a real-world use case. Objectives

Understand the role of asynchronous programming in optimizing complex machine learning workflows.
Explore advanced features of the asyncio library and their application in ML tasks.
Implement asynchronous data ingestion, preprocessing, and model inference in an ML pipeline.
Showcase a detailed, practical use case of an end-to-end asynchronous ML pipeline.
Illustrate the performance improvements and scalability benefits of asynchronous programming in machine learning.

Session Outline (30 minutes)

Introduction to Asynchronous Programming in ML (5 minutes)

Importance of scalable pipelines and overview of asynchronous programming benefits in ML

Deep Dive into asyncio for ML Pipelines (5 minutes)

Advanced asyncio components: Event loop, tasks, and coroutines. Concurrent data processing using asyncio.

Asynchronous Data Ingestion and Preprocessing (4 minutes)

Implementing asynchronous data ingestion from various sources, parallel preprocessing of large data batches.

Asynchronous Model Inference and Deployment (4 minutes)

Asynchronous model inference for high-throughput prediction, Integrating asynchronous inference with FastAPI for real-time services.

Case Study: Real-time Fraud Detection System (7 minutes)

Problem statement: Building a real-time fraud detection system for financial transactions. Synchronous approach: Challenges and limitations. Asynchronous approach: Implementation and performance gains. Code walkthrough, Performance comparison and analysis

Q&A and Discussion (5 minutes)

Prerequisites:

Intermediate to advanced Python programming skills.
- Basic understanding of machine learning concepts.
- Basic familiarity with asynchronous programming and the asyncio library is recommended.
- Experience with ML workflows and data processing techniques.

Speaker Info:

I am working as a Senior Software engineer at a MNC. I am responsible for validation of Machine Learning models on-device. My work involves development and feature additions for a python framework which includes SDK inference, benchmarking and stress tests of ML models. I also have experience working with compression algorithms and enjoy exploring the visualization and analysis of memory usage patterns.

Prior to this, I have worked with a few startups focussing on Machine Learning and Computer Vision. With a keen eye for innovation and problem-solving, I love sharing my knowledge and insights with others. Let's explore the exciting realm of technology together!

Speaker Links:

Github LinkedIn

Blog link: https://towardsdatascience.com/system-failure-prediction-using-log-analysis-8eab84d56d1

Previous talks: PyCon lightning talk 2023: https://www.linkedin.com/posts/animesh145_pyconindia-datascience-pythoncommunity-activity-7114569968884621313-axkl?utm_source=share&utm_medium=member_desktop

Computer Vision DevCon 2020: https://analyticsindiamag.com/leveraging-computer-vision-in-drone-tech/

Section:	Artificial Intelligence and Machine Learning
Type:	Talk
Target Audience:	Intermediate
Last Updated:	25 Sep, 2024

Comments