Pyro Demystified : Bayesian Deep Learning
The ability to estimate uncertainty and incorporate it in decision making is crucial to sensitive applications like Self-driving Cars, (Semi-) Autonomous Surgery, Personalized Medicine, etc., where the consequences of a wrong decision is potentially catastrophic. A Probabilistic Program is the natural way to model such processes.
Pyro is a probabilistic programming language built on top of PyTorch. Pyro is built to support Bayesian Deep Learning which combines the expressive power of Deep Neural Networks and the mathematically sound framework of Bayesian Modeling.
The objective of this session is to introduce the audience to Bayesian Modeling in Pyro, which provides a powerful set of abstractions for Probabilistic Modeling and Inference.
- A 4-step Bayesian Modeling process is introduced by considering Bayesian Linear Regression as an example.
- The performance of the model is evaluated by visualizing the learned parameters with uncertainty estimates.
- This process is applied to different models in different settings, each of which explores a unique aspect of Bayesian Modeling.
Models in consideration:
- Bayesian Neural Network (BNN): The BNN is trained on MNIST using Pyro and the model's behavior is analysed on the MNIST test set and Fashion MNIST.
- Variational Autoencoder (VAE): The VAE is used to generate images, in an unsupervised setting.
- Semi-Supervised VAE (SS-VAE): The SS-VAE is trained on MNIST with missing data, to understand how a bayesian model deals with missing labels.
- Bayesian Modeling in Pyro ( 60 mins )
- Model Definition
- Guide Creation
- Generative Models
- Supervised Learning (25 mins)
- Unsupervised Learning (25 mins)
- Semi-supervised Learning (25 mins)
Bayesian Modeling in Pyro
Pyro supports Probabilistic Modeling and Inference through a set of effects,
param, and a library of Effect Handlers,
poutine. Probabilistic Modeling typically consists of the following steps:
- Model Definition
- Guide Creation
A model is a stochastic function composed of deterministic statements combined with randomness. The primary source of randomness in Pyro, comes from
distributions, a module in Pyro. A
Distribution object represents a probability distribution. The function
sample is an effectful statement that samples from a
Distribution object. It is effectful because it has side-effects in addition to its primary purpose of returning a sample from a given distribution, such as enabling the Effect Handlers to keep track of "sample sites" in the model and change the model behaviour at runtime as necessary.
In Variational Inference, a family of distributions
Q (with "nice" properties) is considered as a Variational Approximation to the true posterior. The Variational Distribution is optimized to minimise the KL-divergence to the exact posterior over the unknowns.
This Variational Distribution is encoded as a stochastic function (guide) using pyro's
param statements. The inference algorithm identifies and aligns the "sample sites" in the model and the guide, and keeps track of variational parameters as defined using the
Stochastic Variational Inference (
SVI) is Pyro's general purpose inference algorithm.
SVI takes gradient steps iteratively, to reduce the (negative) ELBO objective, which is equivalent to reducing the KL-divergence between the true posterior over the latent variables and our approximate Variational Distribution (guide).
The posterior predictive distribution over the outcome variable is estimated and plotted. A good model must be able to account for most of the data points. If the model fails to explain the data points, the model specification must be rewritten to compensate for the unexplained data points.
Supervised Learning — Uncertainty Estimation
So far, the simplest regression setting, Bayesian Linear Regression with a toy dataset, has been considered, to understand Bayesian Modeling and the mechanics of Pyro. In this section, a Bayesian Neural Network (BNN) is trained on the MNIST dataset. The model's performance on the MNIST test set and Fashion MNIST is explored.
Unsupervised Learning — Expressive Power
Variational Autoencoder (VAE) is the simplest setting for Deep Probabilistic Modeling. In this section, a neural network based VAE is implemented in Pyro. The model is trained on the Fashion MNIST dataset. New images are sampled from the decoder module as a demonstration.
Semi-supervised Learning — Handling Missing Data
In a semi-supervised setting, some of the data points are labelled and some are not. In a generative model, the missing data can be accounted for, quite naturally. A Semi-Supervised VAE (SS-VAE) is trained on MNIST with some of the labels randomly removed. New images conditioned on these labels are generated to show the model's performance.
- Bayesian Linear Regression
- Bayesian Neural Network (BNN)
- Variational Autoencoder (VAE)
- Poutine : Programming with Effect Handlers in Pyro
- Stochastic Function
- A Beginner's Guide to Variational Methods
- Stochastic Variational Inference
- Mean-Field Approximation
- Variational Lower Bound
- Posterior Predictive Distribution
Machine Learners interested in exploring Bayesian Deep Learning.
- Understand the need for Bayesian Deep Learning
- Learn to use Pyro's Modeling and Inference toolbox
- Learn Uncertainty Estimation
- Learn to build Deep Probabilistic models in Pyro
- A decent laptop ( > 4 GB RAM )
- python 3.x installed
- Latest versions of pyro, pytorch, matplotlib, pandas
- Basic Knowledge of PyTorch, Bayes Rule
Suriyadeepan Ramamoorthy, Research Engineer at Saama Technologies.