Feature Engineering for Kaggle and Machine Learning Competitions
Mohammad Shahebaz (~shaz13) |
With advancements in machine learning and artificial neural networks, the answers to previously unknown questions are surfaced. It is the data and the feature engineering that makes this A.I and ML a great hype of the 21st century. Albeit the algorithm being complex and extraordinary at solving a task there is always need of crunching the numbers right with feature engineering that help model understand the trend and classes better. This proposal shall cover the feature engineering for competitive machine learning problems at platforms like Kaggle, Analytics Vidhya, and HackerEarth. Additionally, this will cover a case study of a winning solution and the inferences from other competitions.
The talk will cover the following topics.
- What difference feature engineering can make?
- Feature Engineering using Python
- Numerical techniques
- Categorical techniques
- Variable Interactions
- Decomposition techniques
- Using google scholar and domain knowledge
- Case studies on winning competitions
- Live problem-solving at Kaggle Competition
Material links -
- Winning Solution for Analytics Vidhya Hiring Hackathon
- Winning Solution for TechGig Machine Learning Hackathon
- Feature Engineering by Kaggle Expert
- Organization for learning competitive data science solutions - MLByte
Speakers : Sudarshan Gadhave and Mohammad Shahebaz
Sudarshan Gadhave is a Data Science ,Data Engineering & Data Integration professional with over 8 years of experience working on Machine Learning , Data Engineering , Data Visualization and Data Warehousing Projects. Currently, he is working as a Specialist Data Scientist in Analytics R&D team of Nice Actimize ( Nice Systems) working on developing Anomaly & Fraud detection models. Earlier experience of working in Advanced Analytics & Data Warehousing teams of NEC, Japan & John Deere (Deere & Company). Pythonista & expert in Python Machine learning stack (Numpy, Pandas, Scikit-Learn, Matplotlib) Active & Core member of Python Pune meetup group. Presented several talks on Python & machine learning in meetups, conferences and colleges all over Pune.
Mohammad Shahebaz is a data scientist intern at Analytics Vidhya. He is also India's finalist in Microsoft World Championship 2013, the finalist at Master Orator Champion 2016, and has bagged a regional gold medal in International Maths Olympiad (IMO). Currently pursuing out the latest trends in Machine Learning and Artificial Intelligence while winning a competitive position at National level competitions and on Kaggle platform. He loves open-source and have contributed to organizations like Google Web Fundamentals, Scikit Learn, FOSSASIA and is serving as Social Committee Lead at Oppia.org in Google Summer of Code. On a path to set machine learning and artificial intelligence to Indian masses, he open-sources his code and approaches at GitHub and organization MLBYTE.
- Github:- https://github.com/sudarshan1413
- Linkedin:- https://www.linkedin.com/in/sudarshan-gadhave-73567b23/
- Shahebaz LinkedIn Profile
- Shahebaz GitHub Profile
- Rank 2 at Analytics Vidhya overall leaderboard
- Kaggle Profile
Mentions 1. Master Orator Champion 2. 1st runner-up of TechGig Machine Learning Hackathon - June 8, 2018