How i hand-coded core Machine learning Algorithms from scratch in python, what i learned, and why you should too.

arjunkathuria


1

Vote

Description:

Machine learning is taking over the world, from your youtube recommendations to your favorite voice assistant, even in your cars and smartphone's camera!
I would say Machine learning is amongst the most widely discussed topics in recent time's media, right up there with Kylie Jenner and bezos' divorce ; )

The Premise, The Why.

Just like any field of science, Machine Learning too has certain basic concepts and algorithms that act as the fundamental building blocks of everything wonderful you see around you.

Fortunately or unfortunately, in Machine Learning, these algorithms have been made available to us for use as wonderfully optimized, properly packaged and easy-to-use abstractions, think sci-kit-learn. Import a classifier and well be on your way. we essentially treat and use them as BLACKBOXES!

Believe it or not, that is a MAGNIFICENT thing. This is a massive win for practicality, pragmatism and performance.

But like just your Algorithms teacher told you when you asked him/her

- why are we learning all these different sorting algorithms, when in practice you almost will never roll your own implementation? (This was a friend, totally not me)

The teacher pulls down his thick bifocal glasses slightly, looks at you, while marveling at your fantastic haircut, in his firm but empathetic voice, he replies

- Because the concepts you learn would be applicable to a wide range of problems in computer science and will help you understand other problems and concepts better.

I had no reply, i mean my friend had no reply, and was humbled.

The Agenda, The What.

In This talk, i will share with my audience, a crisp take on my journey of What i learned while hand-coding these fundamental machine learning algorithms from scratch.

  • YOU will learn what happens under-the-hood when you use these classifier algorithms and how they work.
  • YOU Will get familiar with the fundamental concepts and ideas that under-pin almost all of these classifiers and more.
  • YOU will also be introduced to the math behind these algorithms, further helping in understanding and demystifying them.

The algorithms i plan to cover are:-

1) The Perceptron algorithm (where it all began)  
2) ADALINE or adaptive linear neuron.  
3) Logistic Regression.  
4) IF time permits, Support Vector Machines.

There will also be python code (not live coding), from where I implemented and trained these algorithms, to formally understand what is happening and what does what, with illustrations where necessary. Trust me, this will be neat ; )

The Outcome

I would like my audience to leave with a clearer view and a greater appreciation of how machine learning happens in the real world, by focusing on what happens under-the-hood, when you pull that classifier out of your favorite machine learning library. The basic concepts that under-pin the fundamental classifiers and what machine learning algorithms try to do, and why it is OKAY to look into the so called blackboxes once in a while ; )

Outline

  1. Author Introduction and the premise of the talk. [2 minutes]

    • The Why and how it came to be.
  2. Demystifying Machine Learning [2-3 minutes]

    • Its much less Magic than you think it is. ✨
  3. The big picture of Machine Learning. [2-3 minutes]

    • The big picture ideas behind machine learning, setting up the audience nicely for the upcoming topics.
  4. Perceptron and ADALINE (ADAptive LInear NEuron) algorithms [7 minutes]

    • Understanding Perceptron, where it all started, and its extension the Adaline Algorithm, how they work, how to train them with relevant code and math.
  5. Logistic Regression Algorithm [7 minutes]

    • Explaining the LR algorithm, how it works, how to train it, what the parameters do, with accompanying python code and some necessary math.
  6. Summing up and outro [2 minutes]

    ---- 1 minute margin ----

  7. Q&A [5 minutes]

The 2 minute casual video for the talk, as asked by the administration can be found here

Peace.

Prerequisites:

Basic Understanding of python, school level math.
Basic Understanding of some linear algebra is encouraged but not necessary.
Curiosity.

Content URLs:

I have already written detailed and illustrated posts about these algorithms on my personal blog, which anyone wishing to read can find at:-

arjunkathuria.com

Respective Posts:-
1) Understanding Logistic Regression
2) Adaptive Linear Neuron (ADALINE)
3) Perceptron Learning Rule
4) Support Vector Machines

Speaker Info:

Hi,

I am Arjun Kathuria, an independent software developer and hacker from New Delhi, India.

I was a Google summer of code student 2016 with jQuery Foundation.

I also held the position of the sole lead frontend software developer for a high growth startup in Bangalore from early 2017 to mid 2018, which i left to follow and pursue my other interests in computer science and music.

I am currently into python, machine learning, full-stack development projects, exploring lower level Kernel stuff and learning Rust.

Speaker Links:

You can find my:-

1) Github Profile, with all my open-source work and contributions.
My GSoC project - hammer.js - has upwards of 19,000 stars.

2) Personal Blog, where i post about stuff as i learn.
The posts for the machine learning algorithms are already there, with nice illustrations.

Section: Data Science, Machine Learning and AI
Type: Talks
Target Audience: Intermediate
Last Updated:

Hi Arjun,

It's always a good idea to have something done using what I call 'first principles'. So, this talk would make a lot of sense from that point of view.

I had a quick look at some of the blog posts (a very quick look in fact). Unfortunately, there was a lot of math (not bad), but very little code.

Considering the time limit of 30 minutes, may be a good idea to take just take one algorithm or may be two and if possible take a problem and demonstrate how well it can be solved using scikit-learn and your home made implementation of algorithms/models along with performance comparison etc. Understandably, your home made implementation is not going to be as feature rich and as performant, but looking at actual code that's early enough in evolution and solves a particular problem would be of great value to community.

So if you could align your proposal along those lines, it might be a good idea.

Abhijit Gadgil (~gabhijit)

Hi Abhijit,

Thanks for taking the time to look into this.

The blog posts are math heavy by design and were for me to revisit what i learned as i learned, the code is really separate and not included in the posts. Its fairly understandable code with low complexity.

Don't worry, it wont be a math class. I understand not every one is mathematically inclined, i plan demonstrate/explain the key concepts in abstract, like cost function minimization or maximization using math only when absolutely necessary being mindful of keeping it accessible to most of my audience.

I do plan to demonstrate how i got these algorithms train, converge and deliver a prediction using the seminal Iris data set, which classifies flowers in three categories based on its features. which is how i tested them.
We will see the results when we run these algorithms and see graphs and plots which will tell us how they performed, so that its all very visual. Pardon me if it wasn't really clear in my proposal.

I think the time constraint you mentioned would only allow me to cover the first three (they are rather two, since ADAline is just an extension of Perceptron with just a little difference, which shouldn't take more than 5 minutes to get done) and people would really want to hear about logistic regression, since its very popular.

Now, since this would be a talk about how these algorithms work, a performance comparison might really not be the best fit here since performance isn't the scope of this talk. I'd be happy to show how easy it is to just pull from sci-kit learn and train, having explained what all the parameters in the classifier do, this would really nicely and elegantly flow into our previous understanding and it would make for those "aha, so THAT is what it does!" moments, as it did for me.

Looking forward to hearing what the team thinks of this.

Arjun

arjunkathuria

Hello Arjun,

We have put together a set of best practices for proposals - please take a look. Your proposal is fairly detailed, which is good. It will be great if you add an outline of the talk, the slides, and a two minute preview video.

Regards,

Abhishek Yadav (~zerothabhishek)

Hi Pycon India team !

Updated the proposal with a nice outline and the video.

arjunkathuria

Awesome!

Abhishek Yadav (~zerothabhishek)

Login to add a new comment.