Chat with Tables: Developing Q&A system on Tabular data using Code Generative LLMs

Kumar, Abhijeet (~a655065_fidelity)


3

Votes

Description:

Generative AI powered by Large Language Models (LLMs) is being used to create novel text, images, and even videos. LLMs specializing in generating code are already being used in enterprise solutions like GitHub Copilot, Gemini Code Assist by Google, watsonx by IBM, and Amazon Q Developer to boost productivity for developers and programmers. On similar lines, code generating LLMs can be leveraged to develop solution which can take natural language query and generate python code to analyze the data for generating insights.

Why does it matter ? Business users and non-technical analysts often need to quickly analyze or transform tabular data in spreadsheets for ad-hoc business intelligence. However, they might lack the necessary programming knowledge to perform tasks themselves and therefore depends on data analysis teams. There is potential value in speeding up their data analysis for insights & report generations. Imagine we have a solution which allows business users (non-tech) to use natural language to answer day to day queries.

Agenda In this workshop we demonstrate mainly following steps:


  1. Prompting a code generative LLM (NLQ: Natural Language Querying).
  2. Setting up end-to-end process to develop Q&A process for Tabular Data like csv.
  3. Generating insights from Tabular Data (CSVs).
  4. Techniques to improve NLQ (Few-shot, Pruning, Validation, Instructions etc.)
  5. Setting up a quick streamlit app.

The workshop will be designed such that attendees can do hands-on practice the tasks covered. Participants can use Google Collab to perform most of the above tasks.

I will begin by covering the basics of code assist models and its usage. Further, I will then build on this foundation to demonstrate how to develop a no-code/low-code natural language querying system. We will cover most of the steps using HuggingFace and other useful libraries for the above agenda.

Audience This workshop is intended for any developers who are interested in developing Gen AI use-cases using code assist LLMs. Basic prior experience of LLMs is required. Basic to Intermediate python skillset should be fine.

Outcomes By the end of the workshop, participants will be able to:

  • Understand working with Code Generative Large Language Models.
  • Develop end to end process for Natural Language Querying on Tabular Data.
  • Practical aspects and challenges of application.

Materials The workshop will provide participants with all the materials they need to complete the exercises. These materials will include a workshop notebooks, datasets and codes.

Topics to be covered in the Workshop

  • State of Code LLMs (Talk)
  • Inferencing: Prompting code LLMs for a task
  • Development of NLQ Application: Design (Talk)
    • NL2Py
    • NL2SQL
  • Setting up end to end python process.
    • Prepare prompts & generate codes.
    • Code execution: Automation
  • Techniques to improve NLQ with Tabular Data
    • Pruning metadata,
    • Providing Few-Shot examples
    • Code Validation
    • Providing Domain Instructions
  • Demo of the Tool
  • Key Takeaways & Closing Talk

Prerequisites:

  1. Basic knowledge of using Python in Machine Learning.
  2. Understanding of Large Language Model.
  3. Familiarity with structured datasets like CSV, Tables etc

Content URLs:

Workshop objective & content outline can be looked here GitHub

Speaker Info:

I am Director, Data Science with 11+ years of relevant experience in solving problems leveraging advanced analytics, machine learning and deep learning techniques. I started my career as a computer scientist in a central government research organization (Bhabha Atomic Research Center) and did research on variety of domains such as conversational speech, satellite imagery and texts. Currently, I am working with Fidelity investment for last 5 years leveraging NLP, Gen AI, data science and graph techniques to solve for business use-cases.

As part of my work, I have used python throughout my career for solving data science problems as well as for pursuing research. I have published several academic and applied research papers and participated in multiple conferences over years. In past, I had trained professionals in machine learning and had been guest lecturer at BITS, Pilani, WILP program for Machine Learning subject (MTech course).

Speaker Links:

Know Me

  1. Linkedln
  2. Github

PyCon 2023: Conducted Workshop

My Blog: appliedmachinelearning

Open Source Contributions:

  1. finbert-embedding
  2. classitransformers
  3. PhraseExtraction

Section: Artificial Intelligence and Machine Learning
Type: Workshops
Target Audience: Beginner
Last Updated: