Using genetic algorithm to improve text classification

azimulhaq


Description:

Achieving higher performance in text classification with a varied combination of

  • Feature engineering
  • Feature selection
  • Ensembling
  • Optimizing feature selection and ensemble learning process through a combination of search and genetic algorithms.

Most importantly, during this talk we will discuss feature selection, which is often overlooked in text classification. Different methods of doing feature selection and what works better than others.

Feature selection not only improves model performance but also reduces vector and model size. Additionally, it will be covered how to combine feature selection with different types of features and ensembling results.

Talk Outline

0-5 mins: Introduction.

5-15 mins: Current methods.

15-25 mins: Main Agenda (My methodology for doing text classification through feature engineering, selection, ensembling & optimizing the process through genetic algorithm and other search algorithms)

25-30 mins: closing remarks and questions.

Prerequisites:

  • An interest in NLP and text classification, especially how to improve text classification model performance.
  • Basic understanding of different text vector representations
  • Basic understanding of ensemble learning.

Video URL:

https://www.youtube.com/watch?v=NMM_m9B6xIM

Content URLs:

https://github.com/StatguyUser/Pycon2021

Speaker Info:

Data Scientist with 10+ years of experience. Interests include NLP, Signal processing, and mathematical optimization. Author of 5 python libraries used in machine learning, geo-spatial data analysis and signal processing.

Speaker Links:

Links for python library and previous talks

NLP

TextFeatureSelection Pypi python library for feature selection for text classification

SNgramExtractor Pypi python library for syntactic ngram feature extraction

Example code to create Flask API for Keras deep learning NLP model

Example code to extract noun and adjective pairs using context-free grammar

Signal Processing

BaselineRemoval Pypi python library for baseline removal from spectral data

Geospatial data analysis

Pandas2Shp Pypi python library for creating shp file for geo-spatial analysis from longitude and latitude

Evolutionary Algorithm

EvolutionaryFS Pypi python library for feature selection using evolutionary algorithms.

Previous meetup talk slides

Dependency Parsing in NLP

Combining NLP and elasticsearch for building semantic search application

Section: Data Science, Machine Learning and AI
Type: Talks
Target Audience: Advanced
Last Updated: