Federated Learning in Python - Training Models Without Looking at Data!
Dhanshree Arora (~DhanshreeA) |
Imagine a world where all the data generated on your phone and on your wearables never left these devices without compromising your user experience. Or, a world where AI collaboration between data owners and developers did not have to be slowed by bureaucracy and data compliance laws like the GDPR. Federated Machine Learning, originally introduced in 2015, aims to create this world. It is the gift of distributed computing to Artificial Intelligence for training models without compromising data privacy.
Federated Machine Learning is a model training technique in which the data never leaves the source (for example, a data silo, a smartphone, or an IoT device) and a copy of a global model is trained at source while sharing only the learning with the global model.
In this talk, I will present an introduction to privacy preserving machine learning with a focus on federated ML, and make a case for federated ML beyond just Gboard and into enterprises. The session will aim to demystify the jargon around Federated Machine Learning, specifically terms such as horizontal vs vertical federated ML, cross device/cross silo training, centralized vs decentralized training, and data centric vs model centric federated learning.
I will briefly introduce existing FedML implementations in Python and also demonstrate a live code walk-through for a minimal centralized federated learning system built using python-socketio and confluent kafka.
Finally, I will discuss the challenges faced by enterprises in incorporating federated machine learning. The most notorious difficulty with Federated ML is non-IID data. Other technical difficulties include network latencies affecting training coordination, single point of failure in centralized federated learning systems, computational limitations at source nodes, etc. Additionally, in data centric federated ML, where a data provider allows AI developers to use their data to build models, federated ML poses challenges for early stage EDA and development.
- Federated Learning 101 ( ~2 mins)
- A look at existing FL open source tools (~5 mins)
- Hands on Federated Learning setup (Kafka/webRTC) and an event driven programming framework (Python-socketio/Starlette/FastAPI) (~ 20 mins)
- A Discussion on real world edge cases for building an enterprise ready FL system (~ 3m)
This session assumes familiarity with the following:
- Python fundamentals
- Event driven programming
- Pub-sub/producer-consumer pattern
Hi there! I am Dhanshree. I've been writing code professionally for two odd years. I have been working with startups for the adrenaline driven learning. I work with machine learning systems with a flair for backend and cloud technologies. I have worked extensively with NLP systems, from building data pipelines to analysis, modeling, and packaging and deployment. Recently I've been learning InfoSec, and private Machine Learning techniques and building developer tooling for enabling private and ethical AI at Eder Labs, where I work as a MLOps Engineer.