Mastering Multi-Model Deployment: Ray Serve Strategies for Low Latency

SIDDHARTH SAHANI (~siddharth8)


41

Votes

Description:

In today's fast-paced business environment, serving numerous machine learning models has become essential to meet diverse business needs and various customized use-cases. However, this necessity brings challenges in efficiently deploying and managing these models while maintaining ease of use and cost-effectiveness. This talk aims to provide a comprehensive insight into different patterns of serving many models using Ray Serve, a scalable model serving library built on Ray.

We will explore how three key features of Ray Serve—model composition, multi-application, and model multiplexing—enable seamless deployment of numerous models while optimizing resource utilization. Attendees will gain an understanding of common industry patterns for serving many models and learn how to simplify management and enhance performance through Ray Serve's unique capabilities. The session will also delve into real-world case studies, showcasing how Ray Serve users run many-model applications in production, highlighting the practical benefits and performance improvements achieved.

Outline 1. Introduction Importance of serving numerous models Challenges in deployment and management 2. Ray Serve Overview Introduction to Ray Serve and its significance 3. Key features: model composition, multi-application, and model multiplexing Key Features for Many-Model Serving 4. Model Composition: Simplifying deployment and management Multi-Application: Catering to diverse use-cases Model Multiplexing: Optimizing resource utilization Case Studies 5. Real-world applications and performance benchmarks Insights from Ray Serve users Conclusion and Q&A Recap of key points

Prerequisites:

  • Basic understanding of machine learning and model deployment
    • Familiarity with model serving tools
    • Knowledge of scalability challenges
    • Programming experience, particularly in Python
    • Interest in real-world applications of machine learning

Video URL:

https://drive.google.com/file/d/12w_wvfs4MkbARCF6MuaM3nRq4o6cAP3X/view?usp=sharing

Speaker Info:

As a Senior Machine Learning Engineer for Kayzen, Siddharth is spear-heading the stability of AI infrastructure and model monitoring at scale. He builds & deploys models for making Advertisers & Media Buyers in the Demand Side successful in their programmatic ad spends. Additionally, he is responsible for designing and implementing several highly effective efficiency measures, including stability of Big data storage, processing and heavily optimising the data & model pipelines.

He is also the Conference Submissions Reviewer at IEEE for over 4 years. Previously, he worked for Ahmedabad based start-ups: Shipmnts & Infocusp. Siddharth also happens to be the Gold Medalist from SRM University and collaborated with IIT Madras for a coveted Research Fellowship. Siddharth brings in 8 years of experience in Enterprise and Cloud architecture.

He leverages this deep understanding of system architecture and machine learning to build scalable and reproducible ML Architectures across Industry verticals – from ed-tech to legal-tech to logistics and supply chain and lately ad-tech. Siddharth has also been the recipient of IET Trendsetter of the Year Award in both 2015 & 2016, and Amul Vidya Bhushan Award, 2012. He has mentored hundreds of data science enthusiasts from school kids to industry leaders via different platforms from The Climber, GreatLakes & Scaler. Besides being a Speaker & a Judge at various Seminars, Conferences & Panel discussions, he is a keen robotics enthusiast and is trained in Contemporary dance.

Speaker Links:

Socials LinkedIn: https://www.linkedin.com/in/siddharthsahani/ Github: https://github.com/dapperlabel StackOverflow: https://stackoverflow.com/users/6649426/siddharth-sahani Medium: https://medium.com/@siddharthsahani7

Links to previous talks 1. AIM MLDS 2024 https://www.youtube.com/watch?v=SzUeydIsVPs&list=PL1Osdi_5mfH6jNSR5iDl0J43GDfpEIFMV&index=25 Bengaluru February 2, 2024 2. IEEE Kaagaz Conference https://www.linkedin.com/feed/update/urn:li:activity:7001762071558148096/?updateEntityUrn=urn%3Ali%3Afs_feedUpdate%3A%28V2%2Curn%3Ali%3Aactivity%3A7001762071558148096%29 Hyderabad, 4-5 November 2022 3. IEEE Epsilon https://www.youtube.com/watch?v=Gvpua_zFxy4 Online April 2021 4. SRM Alumni Webinar https://www.youtube.com/watch?v=z09m2xDPzVw Online May-2020

Section: Artificial Intelligence and Machine Learning
Type: Talk
Target Audience: Intermediate
Last Updated: