Evaluation Techniques for Large Language Model and Retrieval Augmentation Generation

GauriDhande


0

Votes

Description:

Do you know how well your LLM is performing?

Large Language models such as GPT and Gemini has been there for a while. They have been trained on huge data and as result, provides us the information at hand.

However, the key concern is if this information provided by these LLMs are valid or not. In many situations, their knowledge proves to be out of context and they hallucinate or generate some gibberish data. There is also a need to evaluated the RAG pipeline to check if correct data is retrieved from the provided data.

In this talk, let's deep dive in exploring the various evaluation techniques for LLMs and RAGs.

The outline for the talk is going to be as follows:

  1. Introduction

    • 1.1. Opening Remarks

    • 1.2. Objectives of the Talk

  2. Background and Concepts

    • 2.1. Large Language Models (LLMs)

    • 2.2. Retrieval-Augmented Generation (RAG)

  3. Evaluation Metrics for LLMs

    • 3.1. Intrinsic Evaluation Metrics

    • 3.2. Extrinsic Evaluation Metrics

  4. Evaluation Metrics for RAG

    • 4.1. Precision and Recall

    • 4.2. F1 Score

    • 4.3. Retrieval Specific Metrics

  5. Case Studies and Practical Examples

    • 5.1. Case Study 1: Evaluating a Chatbot LLM

    • 5.2. Case Study 2: Evaluating a RAG System for Document Retrieval

  6. Conclusion

  7. Q&A Session

Prerequisites:

Familiarity with using LLMs and some experience with Python.

Speaker Info:

Gauri is Senior Data Scientist at Allegion India. She is a post graduate and have around 5 years of experience. She is working on LLMs and RAGs to build a smooth working pipeline to help business take decision faster. She have published 4 research papers previously on various domain of Data Science and Machine Learning.

Speaker Links:

https://www.linkedin.com/in/gauri-dhande-849928136/

https://medium.com/@dhandegauri

Section: Artificial Intelligence and Machine Learning
Type: Talk
Target Audience: Intermediate
Last Updated: