Feature Engineering - Making an algorithm understand the data



The only way our model can perform at its best if it understands our data the best. Most algorithms only understand numeric data but in practical life that's impossible for us to have every feature in numeric form. This talk will take you all through various techniques by which various types of features can be handled. All Machine Learning algorithms are a beauty in their own. They all have different performance standards for different types of data. While Decision Trees can understand a non-numeric feature (words), k-NN can only work on features of numeric type. Thus, as a data scientist or ML engineer, Feature Engineering is ultimately the most important task along with proper model selection. Feature engineering is the key to let our algorithm understand the data. This talk will introduce most of the useful Feature Engineering techniques for various datatypes that suits the algorithms to process the data eg. Label encoding, one hot encoding, etc. The outline of the talk would be approximately as follows:

  1. Introduction to Feature Engineering (2-3 mins)

  2. How does general ML algorithms work (2-3 mins)

  3. Why does the algorithm require feature engineering for best performance (5 mins)

  4. What kind of engineering could be done: (2-3 mins each)

a. Imputation

b. Coping with outliers

c. Binnin

d. Transformation

e. Feature Split

f. Creating new relevant features

5.How does Feature Engineering can turn out to be a quality that differentiate between a bad and a good data scientist (5 mins)


Basic knowledge of Model building Beginner level exposure to Machine Learning and data science enough to understand what are features and labels and how the basic ML algorithms work

Content URLs:


Speaker Info:

Hello everyone, I'm Saurabh Wani, a third year undergrad from NIT Jalandhar. I'm a core team member of Pydata Jalandhar since last 5 months and I have delivered several talks on topics like 'Introduction to Recommendation Systems', 'Multiclass Classification of Imbalanced Data', etc to all kinds of audience from absolute beginner to Kaggle top 1%. I've worked a lot on building ML models, thus I know the importance and challenges faced while using categorical features of non-numeric type, features that make no sense till there is some alterations are done, etc. Hence to save the efforts of the listeners during the exploratory phase, I'll try my best to introduce them to all required Feature Engineering techniques at a beginner level. Along with this I'll also give them a brief idea about which algorithms require what kinds of data and that what feature engineering technique is best for them. I'll illustrate the techniques using Python packages. I hope you like my proposal and topic of the talk. Looking forward to hear from you soon. Thank you.

Speaker Links:


Id: 1159
Section: Data Science, Machine Learning and AI
Type: Talks
Target Audience: Intermediate
Last Updated: