Building Efficient RAG pipeline using Open Source LLMs

Tarun Jain (~lucifertrj)





Large Language models are all over the place, driving the advancement of AI in today's era. For enterprises and businesses, integrating LLM with custom data sources is crucial to provide more contextual understanding and reduce hallucinations. In my talk, I'll emphasize on building an effective RAG pipeline for production using Open Source LLMs. In simple words, Retrieval Augmented Generation involves retrieving relevant documents as context for user queries and leveraging LLMs to generate more accurate responses.

Problem Statement

  • Closed-source models like GPT, Claude, and Gemini demonstrate significant potential as LLMs, but enterprises and startups with sensitive data hesitate to rely on them due to data privacy and security concerns.
  • While numerous solutions and resources on the internet utilize closed-source models like GPT and Gemini to construct RAG pipelines, there is limited information available on building effective RAG pipelines using Open Source LLMs.
  • When it comes to using Open Source LLM, it is important to understand the prompt template to use to get response in specific format. While those with a basic grasp of Transformers can adjust parameters to enhance results, this approach may not be suitable for everyone.
  • Basic RAG solutions often struggle and tend to produce hallucinations.

Workshop Outline

I propose conducting a fully hands-on workshop, where participants will construct an entire RAG pipeline using open-source: LLMs, vector databases, and embeddings. Additionally, I will demonstrate two advanced techniques aimed at improving results from LLMs. Below is the outline of my workshop talk:

  • Issues with Large Language Models
  • Understand the need of RAG and Open Source LLMs
  • Prompt Engineering Basics - Zero Shot and Few Shot
  • Open Source LLMs parameters tour: Understanding temperature, top_p and so on.
  • Building basic RAG pipeline using Open Source LLMs, embeddings, and vector stores.
  • Advanced Technique: 1- Using Cross Encoders Sentence transformers to Re-rank
  • Advanced Technique: 2- Fine Tune Embeddings for RAG and Hybrid Search
  • Build Streamlit app for your RAG application
  • Deploy it using secrets on

Key Takeaways after Workshop

  • Understanding the importance of Few shot prompting.
  • Understanding the need of Open Source Large language models.
  • How to build the entire RAG pipeline using only open source tools such as embeddings (Fast-Embed), LLM (Mistral-7b) and vector database (Qdrant/Chroma).
  • How to improve the performance of RAG using re-ranking, which again is using Open Source models.

Reference Slides:

I am attaching the reference demo slides to provide an overview to the selection team of how my slides will look:



  • Basics of Python Programming
  • Basics understanding of HuggingFace transformers library.
  • This talk is beneficial for both beginners and intermediates.

Video URL:

Speaker Info:

Tarun Jain is a Data Scientist at AI Planet, Google Developer Expert in AI/ML and Google Summer of Code 2024 contributor at Red Hen Labs.

  • At AI Planet, Tarun has built 8+ state of art AI models, core contributor to BeyondLLM and OpenAGI. Further I have built multiple PoCs for Finance, Banking and Education domain.
  • As a Google Developer Expert in AI/ML, Tarun has delivered 15+ talks around LLM, RAG, Fine tuning for DevFest, Build with AI and other Google for Developer collaborations.
  • Tarun is an active participants for GSoC'24 at Red Hen Lab working on building Multilingual News LLM.

Tarun also shares video on YouTube at AI with Tarun channel on RAG, LLM and Generative AI.

Section: Artificial Intelligence and Machine Learning
Type: Workshops
Target Audience: Intermediate
Last Updated: