Word Mover's Distance for document similarity

Rishab Goel (~RishabGoel)


2

Votes

Description:

Word Mover's Distance is new metric to calculate document similarity. This beats LDA (Latent Dirichlet Allocation) and LSA (Latent Semantic Indexing) in terms of accuracy. This is based on state of the art google word2vec vectors and widely studied Earth Mover's Distance (in transportation). The use of word2vec gives it the power to detect the document similarity, even when 2 sentences have no word in common. This is implemented in Gensim (Topic Modelling Library) in Python.

Prerequisites:

Basics of Machine Learning(Neural Network), NLP and word2vec .

Content URLs:

https://github.com/RishabGoel/pycon_india_slides

Speaker Info:

Rishab Goel is a Master's in CS Student @ IIT Delhi with great interest in Deep Learning (RNNs specifically) and Data science.

Speaker Links:

https://github.com/RishabGoel

Section: Others
Type: Open space
Target Audience: Intermediate
Last Updated: