Bag-of-Features: Representing Text & Image Data as Numerical Vectors
Pranav Suri (~pranavsuri) |
Most machine learning algorithms require feature vectors as inputs. In pattern recognition and machine learning, a feature vector is an n-dimensional vector of numerical features that represent some object (image, text, sound). Feature engineering, the practice of extraction of features from objects is a combination of art and science; it requires the experimentation of multiple possibilities and automated techniques with the intuition and knowledge of the domain expert. Automating this process is called "feature learning," where a machine learns the features itself.
One way to obtain features is to use the 'Bag-of-Features' model, the idea behind which is to simplify object representation as a collection of its subparts. Originally used for representing text data, the "Bag-of-Words" methodology can be extended to different types of objects resulting in models such as "Bag-of-Visual-Words," "Bag-of-Audio-Words." The significance of these models in the age of self-learning deep networks still holds because of their ability to work with limited data.
The contents of the talk are:
- Introduction to Feature Engineering
- Working with Text Data
- Understanding 'Bag-of-Words'
- Example: Text Classification
- Working with Image Data
- Introduction to 'Bag-of-Visual-Words'
- Example: Image Classification
- Comparing the performance to CNN
- Overview of 'Bag-of-Audio-Words'
- Generalizing 'Bag-of-Features'
This talk primarily discusses Bag-of-Words, Bag-of-Visual-Words through an example of text classification and image classification respectively. It also covers the concepts that generalize to models other than Bag-of-Features. The goal is to acquaint the audience who have previously worked on numeric data with some ideas to get started with text and multimedia data.
- Intermediate knowledge of Python
- Familiarity with classification problems
- Familiarity with basic NLP/CV is helpful (but not necessary)
Would update soon after feedback.
I'm a fresh graduate in Computer Science & Engineering. I am passionate about Data Science, and I spent most of my time learning about skills required to excel in the domain. Outside of my professional interests, I am fond of rock music and reading.
- Blog: https://pranavsuri.com
- GitHub: https://github.com/pranavsuri
- LinkedIn: https://linkedin.com/in/suripranav
- Twitter: https://twitter.com/pranav_suri