Word Mover's Distance for document similarity
Rishab Goel (~RishabGoel) |
Word Mover's Distance is new metric to calculate document similarity. This beats LDA (Latent Dirichlet Allocation) and LSA (Latent Semantic Indexing) in terms of accuracy. This is based on state of the art google word2vec vectors and widely studied Earth Mover's Distance (in transportation). The use of word2vec gives it the power to detect the document similarity, even when 2 sentences have no word in common. This is implemented in Gensim (Topic Modelling Library) in Python.
Basics of Machine Learning(Neural Network), NLP and word2vec .
Rishab Goel is a Master's in CS Student @ IIT Delhi with great interest in Deep Learning (RNNs specifically) and Data science.