The State of Privacy Preserving Machine Learning.
Dhanshree Arora (~DhanshreeA) |
Privacy Preserving Machine Learning (PPML)
The need for privacy in AI driven solutions is growing by the day for ethical and legal reasons. With methods such as linkage attacks, reverse engineering datasets from different sources (eg Netflix Prize) , exploitation of memory in neural networks, traditional methods from Information Security are not enough. Privacy should be proactive and built into systems by design in addition to methods such as audit trails, and alert messages. Traditional Machine Learning landscape has made significant progress in the past few years for preserving privacy by design. It does so by incorporating tools and techniques from Information security, cryptography, distributed systems, and statistics. Federated Learning, for example, is one such approach which allows training models without centralizing data, which is a privacy hazard (case in point: Equifax data breach). Differential Privacy is another exciting technique where a data curator ensures privacy by providing data with statistical noise added to it, which keeps data utility intact but makes it difficult to identify individual samples in the data, making linkage attacks useless. Homomorphic Encryption draws heavily from Cryptography and allows for machine learning to take place on encrypted data.
In this talk, I will talk about why traditional approaches to security aren't enough using existing case studies. I will break down the approaches mentioned above and open problems that accompany them. Finally I will discuss the maturity of existing Python frameworks and their enterprise readiness.
- Introduction to Privacy Preserving Machine Learning (~ 2 min)
- Current techniques in InfoSec and Data Anonymization (~5 min)
- And why they are not enough
- The Landscape of PPML: Different approaches & Open Problems (~9 min)
- Differential Privacy
- Homomorphic Encryption
- Federated Learning
- Multiparty Computation
- Current Implementations in Python (~ 9 min)
- Questions (~ 5min)
Work in Progress content slides here
Hi there! I am Dhanshree. I've been writing code professionally for two odd years. I have been working with startups for the adrenaline driven learning. I work with machine learning systems with a flair for backend and cloud technologies. I have worked extensively with NLP systems, from building data pipelines to analysis, modeling, and packaging and deployment. Recently I've been learning InfoSec, and private Machine Learning techniques and building developer tooling for enabling private and ethical AI at Eder Labs, where I work as a MLOps Engineer.