Probability for Hackers
Did you just say probability? Mathematics at a tech conference? This mathematical term tends to elicit very strong reactions (either positive or negative, depending on who you ask) since it has a reputation for being difficult to crack/to keep track: it revolves around a seemingly endless jargon, abstract concepts, Greek letters as notations and more. Each concept holds its own subtle assumptions and everything is interlinked with each other. I know you have had enough of mathematics in the university but the way it is being taught is somewhat isolated from its practical application (less of a hands-on experience and more emphasis on theory) even though we encounter probability and statistics in our daily life. So, I am giving a go at ‘hacking the probability using code'.
We live in a world of ‘Big Data’ today. No matter how large or complex a dataset is, it provides only incomplete information about the questions that interest us. This incomplete information leads to uncertainty. This is where ‘probability’ comes into picture. Probability is “The Science of Uncertainty”. It gives us ways to quantify uncertainty and use it as one of the primary methods for designing new algorithms to model complex data. We use a computer to make predictions about new and/or uncertain events. These algorithms are nothing but Machine Learning techniques that provide automated methods of data analysis. In fact, all of us have already been users of these techniques. To name a few – Automated Spam Detection and filtering in e-mails, product/video recommendation (e.g., "customers who bought/watched X are also likely to buy/watch Y"). All of these are applications of probability in computer science. Most of machine learning techniques are rooted in probabilistic methods and we use it in many forms such as to know what is the best prediction about the future given/considering some past data? What is the best model to explain some data? What measurement should I perform next? etc. This is why it is important to learn probability.
In this talk, I will discuss how you can use your coding skills to "hack probability" – to replace some of the theory and jargon with intuitive computational approaches. My intention here is not to explain any concept precisely, but to merely lay down enough of them on the table to emphasize the role of probability in the fast growing areas such as Artificial Intelligence (superset of Machine Learning and Deep Learning), and Data Science.
“Not once, but twice AI was revolutionized by people who understood Probability Theory”
- Stanford University | CS 109: Probability for Computer Scientists
a. About me
b. Questions to know audience
Diving into Probability (interactive way)
a. Coin toss experiment using JQuery
b. Comparing theoretical Vs experimental probability with D3js
c. Simulating coin-toss experiment with Python
Ingredients to Modelling Uncertainty
a. Sample space
b. Axioms of Probability
Introduction to Random Variables
Relation between Random Variables
a. Joint Probability
b. Marginal Probability
c. Conditional Probability
d. Dependence & Independence
Demystifying Bayes' Theorem
Application of Probability Theory
a. Naive Bayes Algorithm for Spam filtering
Take Away message
- High school level mathematics:
- basic set theory (what is a set and elementary set operations)
combinatorics (knowing different ways of counting)
Most importantly, try to take the following myth out of your mind -
Some people have brains that are pre-wired for mathematical excellence, while everyone else is doomed to struggle with the subject.
- MLopt blog post (This read would help)
I am Machine Learning Engineer at Juxt-Smart Mandate Analytical Solutions Pvt Ltd, I like to explore the jungle of data. My survival arsenal contains: Python, Pandas, Numpy, Scipy, Matplotlib and Scikit-learn. When I am not at work, I like to read miscellaneous blog post ranging from Tech to Life long learning.:)