# Let's Learn Statistics !

**
Bargava Subramanian (~bargava)
** |
** **

# 31

**Description:**

Statistics has some important concepts and thought processes that drive Data Science. But is Statistics an arcane mathematical subject filled with esoteric formulae and concepts - and hence, difficult to learn ? We feel not.

BUT?!!

*"I am a programmer"*, *"math is not my cup of tea"*, *"It's been ages since I did math. I don't know if I am capable of doing it"*, *"WTH? I thought everything is commoditized/productized. So, why learn statistics?*" We hear ya !

Why don't we take an application-centric programming approach to learn some of the basic concepts that drive data science? Is it possible? Most definitely.

Heavily inspired by Allen Downey's books *Think Stats* and *Think Bayes*, and also his Pycon US workshop(s), we try to demystify some of those concepts using some real-life examples. Some key concepts that we plan to cover are:

- Standard Deviation, Variance, Co-variance (
*Assumption*: Hoping everyone knows a bit about mean, median, mode :) ) - Probability distribution
- What is hypothesis testing?
- What are t-test, p-value, chi-squared test, confidence intervals ?
- Correlation
- Confidence level and Significance level
- Re-sampling and its relevance in the world of Big Data
- What is A/B testing?
- A simple linear regression model

We would be doing data analysis using Pandas along with numpy and scipy. We would be doing some plotting using matplotlib/seaborn.

We would be using IPython Notebook to drive the workshop. The contents of the workshop are available at the repo: https://github.com/rouseguy/intro2stats . *It is currently a work-in-progress. All the code, data and presentations would be available in this repository prior to the workshop*

**Prerequisites:**

**Technical/Software Knowledge**

- Basics of Python (Must) : Attendees should know how to write functions; read in a text file(csv, txt, fwf) and parse them; conditional and looping constructs; using standard libraries like os, sys; lists, list comprehension, dictionaries
- Introduction to Pandas, Numpy, Scipy (Good to have).

Links to get started on all of them are given below in the *Content urls* section.

**Software Requirements-Must have**

- Python 2.7
- git

**Software Requirements-Recommended**

We would be cloning a git repo and working off it. Link to that will be posted closer to the workshop date. There will be a requirements file that, when executed, will install all necessary libraries. For sake of completeness, we would need the latest versions of the following libraries:

- Numpy
- Pandas
- Scipy
- Matplotlib
- Seaborn
- IPython (along with IPython notebook)

**Software-Optional**

If attendees are comfortable, they can install and use Anaconda. If using Anaconda, prior to the start of workshop, please verify if all the requisite libraries are installed. *Disclosure* I use Anaconda

**Content URLs:**

- Workshop Repo- Introduction to Statistics
- Introduction to Pandas
- Introduction to Numpy and Scipy
- Introduction to Python
- Introduction to Statistics by Allen Downey - Book
- Introduction to Statistics by Allen Downey - Pycon 2105, Montreal - Video
- Introduction to Bayesian Statistics by Allen Downey - Book

**Speaker Info:**

- Bargava Subramanian is a Senior Statistician at Cisco. He has a Masters from University of Maryland, College Park, USA.
- Raghotham is a full-stack developer at RedMart. He has a Masters from BITS, Pilani.

**Speaker Links:**

- Introduction to Classification Methods in Machine Learning, Fifth Elephant 2014, Bangalore
- Data processing using Blaze, BangPypers Jan 2015, Bangalore
- Visualization Libraries in Python, BangPypers Apr 2015, Bangalore

## 0

This is wonderful Course content for Beginners like me. Please let me know when you are conducting.

Sanjay Kumar (~sanjay2)Login to add a new comment.