Text Classification through Feature Engineering, Selection, Ensembling & Metaheuristics



Achieving higher performance in text classification with a varied combination of

  • Feature engineering
  • Feature selection
  • Ensembling
  • Optimizing feature selection and ensemble learning process through combination of search and metaheuristic algorithms.

Most importantly, during this talk we will discuss feature selection, which is often overlooked in text classification. Different methods of doing feature selection and what works better than other.

Feature selection not only improves model performance but also reduces vector and model size. Additionally, it wil be covered how to combine feature selection with different type of features and ensembling results.

Talk Outline

0-5 mins: Introduction.

5-15 mins: Current methods.

15-25 mins: Main Agenda (My methodology for doing text classification through feature engineering, selection, ensembling & optimizing the process through metaheuristics and other search algorithm)

25-30 mins: closing remarks and questions.


  • An interest in NLP and text classification, especially how to improve text classification model performance.
  • Basic understanding of different text vector representations
  • Basic understanding of ensemble learning.

Video URL:


Content URLs:


Speaker Info:

Data Scientist with 10+ years of experience. Interests include NLP, Signal processing and mathematical optimization. Author of 4 python librares used in machine learning, geo-spatial data analysis and signal processing.

Speaker Links:

Links for python library and previous talks


TextFeatureSelection Pypi python library for feature selection for text classification

SNgramExtractor Pypi python library for syntactic ngram feature extraction

Example code to create Flask API for Keras deep learning NLP model

Example code to extract noun and adjective pairs using context free grammar

Signal Processing

BaselineRemoval Pypi python library for baseline removal from spectral data

Geo-spatial data analysis

Pandas2Shp Pypi python library for creating shp file for geo-spatial analysis from longitude and latitude

Previous meetup talk slides

Dependency Parsing in NLP

Combining NLP and elasticsearch for building semantic search application

Section: Data Science, Machine Learning and AI
Type: Talks
Target Audience: Advanced
Last Updated: