Model deployment with Nvidia Triton and deploying LLMs

Rohit Gupta (~rohitgr7)


4

Votes

Description:

In this talk, we will dive into the world of model deployment with Nvidia Triton and explore the nuances of deploying LLMs, such as GPT-3, GPT-4, and beyond. Nvidia Triton, formerly known as TensorRT Inference Server, is an open-source software platform designed to streamline the deployment of AI models in production environments. It provides a robust and scalable solution for serving deep learning models, including LLMs, with high throughput and low latency.

During the session, we will begin by discussing the challenges associated with deploying LLMs and how Triton addresses these challenges. We will explore the architecture and features of Triton that make it well-suited for deploying LLMs in real-world applications. From model versioning and management to efficient resource utilization, Triton offers a comprehensive solution for deploying LLMs at scale.

We will then move on to practical demonstrations, showcasing how to deploy LLMs using Triton. Participants will learn how to set up a Triton server, configure models, and handle various input and output formats. We will discuss the best practices for optimizing LLM deployment, including batching, concurrent requests, and GPU utilization.

Finally, we will touch upon the latest advancements and trends in LLM deployment, including mixed-precision training and inference, model pruning, and model parallelism. We will discuss how Triton integrates with these advancements to provide efficient and scalable solutions for deploying state-of-the-art LLMs.

Prerequisites:

  1. Since this topic applies to all kinds of ML models, it's good enough for anyone who has high-level experience of building models with PyTorch.
  2. How LLM works, which I guess everyone will know after ChatGPT.

Content URLs:

SLIDES: ~ (Will be modified later)

GitHub project: https://github.com/rohitgr7/triton_inference_cc

Speaker Info:

Rohit is the ML Lead at Mazaal AI. Previously he worked at Lightning AI as Research Engineer on the PyTorch Lightning project. He has contributed to open-source projects. He is mostly into Open-source, ML products, and getting ML research in production.

Speaker Links:

  1. GitHub: https://github.com/rohitgr7
  2. Twitter: https://twitter.com/imgrohit
  3. LinkedIn: https://linkedin.com/in/rohitgr7

Section: Data Science, AI & ML
Type: Talks
Target Audience: Intermediate
Last Updated: