Asynchronous Programming for Scalable Machine Learning Pipelines in Python
Animesh Dutta (~animesh1) |
2
Description:
This session will provide an in-depth exploration of integrating asynchronous programming techniques with machine learning workflows to create scalable, high-performance ML pipelines. In this session, I will cover advanced concepts and practical applications using Python's asyncio library to optimize data ingestion, preprocessing, and model inference. A detailed use case will demonstrate building an end-to-end asynchronous machine learning pipeline for a real-world use case. Objectives
- Understand the role of asynchronous programming in optimizing complex machine learning workflows.
- Explore advanced features of the asyncio library and their application in ML tasks.
- Implement asynchronous data ingestion, preprocessing, and model inference in an ML pipeline.
- Showcase a detailed, practical use case of an end-to-end asynchronous ML pipeline.
- Illustrate the performance improvements and scalability benefits of asynchronous programming in machine learning.
Session Outline (30 minutes)
Introduction to Asynchronous Programming in ML (5 minutes)
- Importance of scalable pipelines and overview of asynchronous programming benefits in ML
Deep Dive into asyncio for ML Pipelines (5 minutes)
- Advanced asyncio components: Event loop, tasks, and coroutines. Concurrent data processing using asyncio.
Asynchronous Data Ingestion and Preprocessing (4 minutes)
- Implementing asynchronous data ingestion from various sources, parallel preprocessing of large data batches.
Asynchronous Model Inference and Deployment (4 minutes)
- Asynchronous model inference for high-throughput prediction, Integrating asynchronous inference with FastAPI for real-time services.
Case Study: Real-time Fraud Detection System (7 minutes)
- Problem statement: Building a real-time fraud detection system for financial transactions. Synchronous approach: Challenges and limitations. Asynchronous approach: Implementation and performance gains. Code walkthrough, Performance comparison and analysis
Q&A and Discussion (5 minutes)
Prerequisites:
- Intermediate to advanced Python programming skills.
- Basic understanding of machine learning concepts.
- Basic familiarity with asynchronous programming and the asyncio library is recommended.
- Experience with ML workflows and data processing techniques.
Speaker Info:
I am working as a Senior Software engineer at a MNC. I am responsible for validation of Machine Learning models on-device. My work involves development and feature additions for a python framework which includes SDK inference, benchmarking and stress tests of ML models. I also have experience working with compression algorithms and enjoy exploring the visualization and analysis of memory usage patterns.
Prior to this, I have worked with a few startups focussing on Machine Learning and Computer Vision. With a keen eye for innovation and problem-solving, I love sharing my knowledge and insights with others. Let's explore the exciting realm of technology together!
Speaker Links:
Blog link: https://towardsdatascience.com/system-failure-prediction-using-log-analysis-8eab84d56d1
Previous talks: PyCon lightning talk 2023: https://www.linkedin.com/posts/animesh145_pyconindia-datascience-pythoncommunity-activity-7114569968884621313-axkl?utm_source=share&utm_medium=member_desktop
Computer Vision DevCon 2020: https://analyticsindiamag.com/leveraging-computer-vision-in-drone-tech/