Discrete Probability Distributions: Learning & Testing using tools from Fourier Analysis

Bala_Priya_C


103

Votes

Description:

Motivation:

  • Discrete probability distributions show up wherever we have data in the real world.
  • Given the data, gaining some meaningful insights and inferences about the underlying discrete probability distribution is a fundamental problem in applied statistics, data analysis with interesting applications in statistical learning and theoretical computer science. For example, we may often want to learn an approximation to the distribution underlying the data or seek to estimate certain parameters of the distribution.
  • With the huge influx in data in the recent years, it is highly probable that the data points may come from a distribution with a potentially large alphabet size. (domain) Therefore, coming up with sample efficient algorithms to make meaningful inferences about the data becomes a primary area of interest.

Problem Statement:

In particular, this talk would focus on the following aspects:

  • We have narrowed down the class of distributions to n-k SIIRV.
  • An n-k SIIRV is the sum of n independent integer random variables each with a support set of {0,1,…,(k-1)}
  • The problem of interest here is Membership testing.
  • Given access to samples from an unknown distribution, can we possibly come up with sample optimal methods to check if the distribution belongs to a particular family or not?
  • The key idea is the sparse Fourier support of the n-k SIIRV with sufficiently large variance.

Key Idea:

  • Learn the distribution assuming that it belongs to the particular class of distributions. (the well-known distribution learning problem)
  • Use the hypothesis obtained to check if the assumption is TRUE/FALSE

Key Steps:

  • Generate samples from the n-k SIIRV and estimate mean and variance using the samples drawn
  • Compute the empirical PMF Q from the samples drawn
  • Compute the Fourier Transform (DFT modulo M) of the empirical PMF
  • Find the effective Fourier support L
  • Output a succinct Hypothesis H
  • Use the projection sub-routine
  • Sample complexity in checking if the hypothesis is True/False is linear in the effective support size

Such an abstract idea has been proved through formal mathematical reasoning but simulation always helps in gaining better insights. The algorithmic implementation is in Python primarily making use of SciPy stats module, fftpack in SciPy that’s helpful in computing the DFT modulo M of the empirical PMF and NumPy.

Outline of the talk: (Duration: 30 mins)

  • 0-5 mins : Motivation about the problem statement & general overview (Slides 1-3)
  • 5-8 mins : Explaining the problem of distribution membership testing (Slides 4-5)
  • 8-11 mins : n-k SIIRV and generating samples from an n-k SIIRV (Slides 6-7)
  • 11-15 mins : Elaborating on key idea, DFT modulo M and its approximate sparsity(Slides 8-10)
  • 15-20 mins: Explaining the algorithm's pseudocode (Slides 11-14)
  • 20-25 mins: Going over some results and possible open problems (Slides 15-16)
  • 25-30 mins: Citing the references for further reading and Q&A session (Slide 17)

Prerequisites:

  • Familiarity with basic python programming and NumPy
  • Familiarity with basic probability theory (Discrete Probability Distributions)
  • Anyone who is interested in probability theory is welcome to attend the talk.
  • The target audience group is Beginner-Intermediate

Video URL:

https://youtu.be/eK3xL2lnQgY

Content URLs:

This is the link to the draft of the slides prepared:

https://drive.google.com/file/d/19_wSvURQnh-MyS04F7qtu4n535EtF6Yd/view?usp=sharing

Speaker Info:

I'm a Wireless Communications and Signal Processing enthusiast with a keen interest in Mathematics, especially, Linear Algebra, Probability Theory and Convex Optimization. I'm passionate about Diversity and Inclusion and Women in Tech. I'm a volunteer at WomenTech Network and was a Global Ambassador for the WomenTech Global Conference 2020 & a member of forums such as Women in AI, WomenWhoCode.

I'm one of the recipients of the LiFT 2020 scholarships in the 'Women in Open Source' category which included a scholarship to attend the Open Source Summit & Embedded Linux Conference (OSS+ELC '20) I am also one of the student scholarship winners of the Grace Hopper Celebration, India (GHCI '20).

As part of my undergrad days, I've served as the Chairperson of the Students' Union (2017-18), Secretary of the Students' Union (2016-17), student member of Academic Council,Extra-curricular Activities and Student Welfare Committee at the Coimbatore Institute of Technology where I pursued my Bachelors in Electronics and Communication Engineering.

I've spoken on several occasions at the Institute level and also at NCIETFSC 2017 where I got the Best Paper Award for my work titled "Smart Agriculture Automation & Monitoring using NI myRIO".

I would love to be a part of PyCon India 2020 and I'd be extremely happy if I'm given the chance to speak at PyCon India 2020.

Speaker Links:

Here's the link to my

LinkedIn profile: here

GitHub: here

Section: Scientific Computing
Type: Talks
Target Audience: Intermediate
Last Updated: