Analysing Datasets using Pandas in Privacy Preserving Manner
Hrishikesh Kamath (~kamathhrishi) |
Imagine a future where we could analyse all healthcare datasets in the world to develop better healthcare solutions or analyse all fraudulent activity datasets to develop better Fraud prediction models. Unfortunately most datasets in the world are confined by organisational boundaries, primarily due to regulations such as GDPR, HIPAA and CCPA or intellectual property. The problem of regulations and IP unfortunately does not allow several industries such as healthcare, LegalTech, fintech and banking to completely leverage datasets to solve problems.
GreyNSights offers a solution to the problem by leveraging privacy preserving data analysis. It allows analysts to analyse datasets without having the data owners move the dataset beyond organisational boundaries. It is a framework that allows a data analyst to analyze and transform sensitive datasets remotely using Pandas. The privacy of the dataset is ensured by allowing the analyst to only see the aggregate statistics and not individual data rows. This is ensured using a pointer based graph verification. The aggregate query results do not leak the individual datapoints by outputting differentially private results to queries. GreyNSights can also be used to analyze and transform datasets of multiple data owners such that the query results of individual data owners are private to analysts. This allows analysts to query datasets by different data owners as a single dataset, known as Federated Analytics. Currently, Federated Analytics support is extended only to linear queries.
- Privacy Problem (6 mins): describes why collaboration between organizations is a problem and possible leakage. The possible advantages of solving the problem.
- Overall Approach of GreyNSights (6 mins): Principles of design used by GreyNSights to offer an optimal privacy-utility tradeoff
- Pandas (3 mins): A short introduction to Pandas
- Use Case 1: Demonstrate how GreyNSights can be used to analyze and transform a dataset
- Use Case 2: Demonstrate how Federated Analytics can be leveraged (5 mins)
Basic Pandas Basic Data Analysis
I currently work at Ederlabs as a Privacy AI Engineer, developing solutions for privacy-preserving AI. My general interests are in Machine Learning and Privacy-Preserving Systems.