PyProfound : How to train the best machine learning model with zero lines of code ?

Rishiag25 | 30 Jun, 2016

2

Votes

Description:

Background -

Building a machine learning model is easy but getting that elusive 1 percent increase in precision/recall when you are already in the high nineties is as frustrating as it is to watch Sachin bat through his 90s.

The sheer abundance of available algorithms and their hyperparameters becomes apparent when we look at the number of papers being submitted to the machine learning related sections of arXiv on any given day [1] . This makes sifting through the various combinations of algorithm-parameters a work of a seasoned professional.

Adding to the complexity in finding the best machine learning model for your particular use-case, is the prerequisite of knowing any of the myriad data-science centric programming languages like Python, R or Julia and a knowledge of extracting the best features from the available dataset. Pre-processing the data to handle missing values, categorical features and columns with low information content becomes as important as the algorithm you have chosen.

About PyProfound -

PyProfound is an open-source desktop application built using electron.js and Python, that takes a dataset of your choice and returns a machine learning model which scores the best on cross-validation among all combinations possible on Scikit-Learn.

The feature extraction module automatically generates natural language features like part-of-speech tags, named entities in the case of textual data. In our algorithm to search for the best model, we also plan to compare the parsing output from the most popular open-source text parsers available such as Stanford’s CoreNLP, Python NLTK, SpaCy and Google’s Parsey McParseface.

There would soon be a bot interface to the application as well, which would make the application much easier to use — it would seem like you are being guided by a machine learning expert on the other end.

During the talk, we will explain how the application runs through a battery of pre-processing steps and performs a grid-search like operation on the feature vectors to find the one true classifier.

Prerequisites:

No programming knowledge required — it is a GUI after all. Anyone with a data science use-case, but without the necessary know-how can attend this session. Although, it’s good to have some statistics or basic machine learning knowledge.

Content URLs:

https://github.com/SurukamAnalytics/pyprofound

Speaker Info:

I am a 4th year undergraduate student at IIT Kharagpur. I contributed to PyProfound as a part of my internship at Surukam Analytics. I am very enthusiastic about machine learning in general and natural language processing in particular — it has driven all my attention and focus towards it.

Speaker Links:

You can find me on Github at https://github.com/rishiag25

Section:	Data Visualization and Analytics
Type:	Talks
Target Audience:	Beginner
Last Updated:	30 Jun, 2016

Comments