Unlock Data with Natural Language: Building Data Assistant for Business using Code LLMs

Kumar, Abhijeet (~a655065_fidelity)




Generative AI (LLMs) for codes had become very popular and powerful tool for developers to leverage with rise of enterprise solution like GitHub Copilot, AWS Code Whisperers, Google Duet etc. There are numerous open source code assist models for generating codes for hundreds of programming languages including python, java, SQL etc. These models can take english comments or instructions to write code for developers. Apart for improving productivity of developers, these models can also be used to create impactful NL-2-SQL python-based solution in data space.

  • Allows business users (non-tech) to use natural language to answer day to day queries.
  • Speed up data analysis for insights & report generations.

Why does it matter ?

  • Many business users and financial analyst perform ad hoc analysis to answer business queries on the fly. They often struggle to produce data from data lakes and required systems which leads to higher effort from analyst teams to generate desired output.
  • Data governance teams in enterprise validates data migration process with many test cases to ensure data quality. Develop test use-cases takes extensive effort in writing data queries to validate.

CEO of Snowflake Inc. (cloud-based data company), Sridhar Ramaswamy in press briefing said, “Our dream here is, within a year, to have an API that our customers can use so that business users can directly talk to data,”. Natural language querying can enables various stakeholders — including executives, employees, customers, prospects, and partners — to pose questions about data in natural language and receive relevant responses.

Outline of the Talk

  • Introduction to problem, use-case and application (NL-2-SQL).
  • State of Code LLMs
  • Generalist (prompt-based) vs Specialist Models (fine-tuned) models
  • Building based data assist capability: Implementation Diagram
  • Practical challenges and Strategies to overcome.
  • Demo of the Tool

Key Takeaways: By the end of the talk, participants will be able to familiarize with:

  1. Interesting enterprise application in data space.
  2. Existing open-source code generative models and its benchmarking on tasks.
  3. Overall technical architecture using LLMs to develop low-code/no-code data assist solution for business.
  4. Python Implementation details and practical challenges.


  1. Basic knowledge of using Python in Machine Learning.
  2. Understanding of Large Language Model.
  3. Familiarity with enterprise databases.

Content URLs:

Coming soon...

Speaker Info:

I am Director, Data Science with 11+ years of relevant experience in solving problems leveraging advanced analytics, machine learning and deep learning techniques. I started my career as a computer scientist in a central government research organization (Bhabha Atomic Research Center) and did research on variety of domains such as conversational speech, satellite imagery and texts. Currently, I am working with Fidelity investment for last 5 years leveraging NLP, Gen AI, data science and graph techniques to solve for business use-cases.

As part of my work, I have used python throughout my career for solving data science problems as well as for pursuing research. I have published several academic and applied research papers and participated in multiple conferences over years. In past, I had trained professionals in machine learning and had been guest lecturer at BITS, Pilani, WILP program for Machine Learning subject (MTech course).

Speaker Links:

Know Me

  1. Linkedln
  2. Github

PyCon 2023: Conducted Workshop

My Blog: appliedmachinelearning

Open Source Contributions:

  1. finbert-embedding
  2. classitransformers
  3. PhraseExtraction

Section: Artificial Intelligence and Machine Learning
Type: Talk
Target Audience: Beginner
Last Updated: