Demystifying Machine Learning Predictions: A Hands-on Guide with SHAP

Dhruv Nigam (~dhruv40) | 30 May, 2024

0

Votes

Description:

You build an extraordinary machine-learning model LLMs + Transformers + xGboost and Special relativity to predict whether a potential customer would default if given a loan. You are proud of yourself, and the bank you work for is happy. The model is deployed in production.

A customer comes in the next day with his loan application. Your model predicts that he would likely default on their loan. The system rejects the application. The customer is furious and will not leave the bank without an answer. WHY WAS HE REJECTED!?

SHAP comes to the rescue! This talk will demonstrate how to leverage SHAP to explain individual model predictions. We'll explore how SHAP assigns feature importances and utilizes game theory to explain a prediction's contribution from various features.

By the end of this session, you'll be equipped with the knowledge and hands-on experience to:

Interpret Individual Predictions: Gain insights into why your model makes specific predictions for new data points.
Identify Key Features: Understand which features significantly impact your model's decisions.
Debug and Improve Models: Diagnose potential biases and use SHAP's explanations to refine your model for better performance.

Outline

Introduction to Explainable AI (XAI) and its significance (5 minutes)
Unveiling the Black Box: Understanding Machine Learning Model Predictions (5 minutes)
Introducing SHAP: A Game-Changer for Explainable AI (5 minutes)
Hands-on SHAP with a Bank Default Dataset (15 minutes)
- Loading and Preparing the Bank Default Data
- Building a Machine Learning Model for Loan Default Prediction
- Explaining Predictions using SHAP's force_plot and summary_plot functions
Conclusion and Q&A (5 minutes)

Prerequisites:

Python and basic data science.

Video URL:

https://drive.google.com/drive/folders/1CXHJLazxbqsdeeH8yXesQB-518FnpKkd?usp=drive_link

Content URLs:

Based on an internal talk I previously gave using a jupyter notebook https://github.com/dhruvnigam93/credit-risk-modelling-primer/blob/main/Credit%20risk%20modelling.ipynb

Speaker Info:

Dhruv Nigam

Dhruv is a machine learning engineer who loves to build and deploy models at scale using Python. At Dream11, he leverage uplift modeling, reinforcement learning, and supervised learning to create action systems that enhance the user experience for over 100 million users. Before Dream11, Dhruv was a Director and founding Data scientist at Protium. He was key in scaling data science infrastructure from scratch to serve over 500k customers at Protium. He established core data engineering pipelines, data models, and deployment frameworks (GitLab CI/CD, Fast API, EC2, MlFlow) for machine learning models. He has spoken at various prestigious venues including a sponsor talk at CODS COMAD 2024. He has a bachelors and Masters in Electrical Engineering from IIT Bombay.

Ved Prakash

Ved is a skilled ML engineer with 9+ years of experience in conceptualizing and deploying large-scale machine learning and deep learning solutions. At Dream11, he has been a key player in reengineering the core contest generation engine. He is currently engaged in building state-of-the-art deep learning models tailored for tabular data domains. Before joining Dream11, Ved led the search and personalization initiatives at Paytm, where he built and deployed cutting-edge real-time machine learning solutions.

Speaker Links:

Dhruv

Linkedin - www.linkedin.com/in/dhruv-nigam-52531176.
Github - https://github.com/dhruvnigam93.
Twitter - https://twitter.com/druubeey.
Talk on credit risk modeling organized by Databuzz and DPhi - https://www.youtube.com/live/4acAw17khkY?si=vD-83gcY99CehXis.

Ved

https://github.com/ved93.
https://www.linkedin.com/in/vedthedataguy/.
Talk on real time ML- challenges and solutions - https://www.youtube.com/watch?v=DD5f-Gz1890.

Section:	Artificial Intelligence and Machine Learning
Type:	Talk
Target Audience:	Intermediate
Last Updated:	09 Jun, 2024

Comments