Data driven decision making using Structural Equation Modeling (SEM)

Dammalapati Sai Krishna (~dammalapati)


5

Votes

Description:

Business and Government leaders want to make data-driven decisions today. For which, they are enabling the collection of huge amounts of data.

But too many variables is not always a great thing in building decision-support systems. "Curse of Dimensionality" is a well-known challenge in Statistical and Machine Learning Modeling. As humans, we cannot visualise more than 4 variables with the naked eye. There are other issues like multi-collinearity and redundancy with the presence of many variables. Data Scientists have thus developed a suite of techniques to reduce the number of dimensions. The Principal Component Analysis (PCA) is one such famous technique. However, PCA is not an explainable method. The principal components thus developed would be a linear combination of bizarre variables.

It is in this context, the techniques provided in Structural Equation Modeling (SEM) are beneficial. SEM is a more explainable way of reducing the number of dimensions by grouping them into factors that have an underlying theory.

In the workshop, I would perform the SEM on Assam's flood data. There are multiple variables that are used to study floods in a region - rainfall, slope, flood damages, population affected etc. We group these variables into relevant factors/buckets and test if the grouping is valid. Thereby we reduce the variables into 5 factors namely, flood proneness, demographic vulnerability, government response, flood damages and access to infrastructure. Later, we would further reduce these factors into a single factor - Composite Flood Risk Index.

As a result, we would be able to score 180 Revenue Circles in Assam based on their performance in each of these factors and also calculate the Composite Flood Risk Index for each revenue circle. This index will help government leaders in distributing resources and planning disaster response.

What would participants learn from this workshop?

  1. A dimensionality reduction technique called Confirmatory Factor Analysis (CFA).
  2. A novel Python Package called semopy
  3. Developing data-driven indices/ranks for any given subject.

Format of the workshop:

  1. Introduction to the datasets [30 mins]
  2. Introduction to SEM and semopy package of Python. [30 mins]
  3. Hypothesis testing with semopy package [60 mins]
  4. Results, interpretations and discussion [60 mins]

Prerequisites:

Linear models. Basic Python programming.

Speaker Info:

Sai Krishna is a Data Engineer at CivicDataLab and a graduate of the National Institute of Technology Karnataka (NITK 2017). Ever since graduation, he has taken an immense interest to work in the intersection of technology (Data Science) and public policy. Over the last 5 years, he worked with NGOs, Universities, Governments and Startups in deploying Data Science solutions to the Public policy problems like Disaster Management, Urbanisation and Air Pollution.

Speaker Links:

https://www.linkedin.com/in/saikrishnadammalapati/

Section: Data Science, AI & ML
Type: Workshops
Target Audience: Intermediate
Last Updated: