Document Clustering with Word2vec and Hierarchial Clusters

Karmanya Aggarwal (~CalmDownKarm)




Overall the talk is going to be about topic modelling, however, I'd like to talk about 2 things in particular

  1. Performing LDA onto a dataset, extracting most popular themes and then using word2vec and clustering to agglomerate the themes into clusters. Using Hierarchical Clustering to fit the themes into a fixed number of labels. Similar to what google's NLP classification API attempts to do.

  2. Visualizing Clusters of words/sentences/phrases using Dendrograms and t-SNE

Finally, if I get time I'd like to talk about StitchFix's LDA2vec approach, but I think the first 2 will last 30 minutes unless the audience is very familiar with how this sort of stuff works.


Some familiarity with clustering (Kmeans) is helpful, but not required.

Recently graduated from BML Munjal University, Developer at Gramener.

Section: Data science
Type: Talks
Target Audience: Intermediate
