Data Catalog enrichment using Generative AI
Sasidhar Donaparthi (~sasidhar) |
1
Description:
One of the major challenges is to keep the enterprise data catalog up-to date with proper meta data. This needs lot of effort and time from the data management/governance teams. Only the subject matter experts can curate this data. I would like to present how we have used Generative AI to create the column and table descriptions that were missing in the data catalog. I will be covering the following topics in my talk
- What is data catalog and what are current challenges of managing meta data?
- Various LLMs and Generative AI techniques used to arrive at the optimal solution.
- Challenges in evaluating the model output systematically and come-up with validation metrics.
- How to win the confidence from the end users
- Lessons Learnt
This is the use case we are actively pursuing at our company and I would like to share the learnings and challenges we have faced.
Prerequisites:
Basic understanding of LLMs and Basic knowledge of Python
Content URLs:
I can't make the content public due to the restrictions of information security policies at the compnay. Glad to provide answers to any of the questions. I will be working with my company to create the content and get the necessary approvals and then share the presentation here.
Speaker Info:
I am a mechanical engineering graduate with 25+ years of experience in manufacturing and financial services domains, I have started my career as design engineer in hydraulic turbine manufacturing company. After spending 5 years, I have stated my IT journey at Aspect Development/i2 Technology. I have worked primarily on data scrubbing, modelling, analysis and data migration projects for supply chain management. I then joined technology services side of Fidelity, financial services company and currently working as data scientist. I have been using python for last 6+ years for automation, data analysis, data science, web development, etc. I am very excited about the endless opportunities that arise in day today work and application of python for solving problems, automating day to day activities. I conduct regular training sessions for data analys ( numpy, pandas, scikit-learn and matplotlib) in my company.
I am a regular speaker at Pycon India conference. I have done various talks and workshops in Pycon 2017, 2018, 2022 and 2023
Speaker Links:
github link - https://github.com/sdonapar
linkedin profile - https://www.linkedin.com/in/sasidonaparthi
twitter handle - @sdonapar