Document Clustering with Word2vec and Hierarchial Clusters

Karmanya Aggarwal (~CalmDownKarm)




Overall the talk is going to be about topic modelling, however, I'd like to talk about 2 things in particular

  1. Performing LDA onto a dataset, extracting most popular themes and then using word2vec and clustering to agglomerate the themes into clusters. Using Hierarchical Clustering to fit the themes into a fixed number of labels. Similar to what google's NLP classification API attempts to do.

  2. Visualizing Clusters of words/sentences/phrases using Dendrograms and t-SNE

Finally, if I get time I'd like to talk about StitchFix's LDA2vec approach, but I think the first 2 will last 30 minutes unless the audience is very familiar with how this sort of stuff works.


Some familiarity with clustering (Kmeans) is helpful, but not required.

Content URLs: (Blog Post)

Speaker Info:

Recently graduated from BML Munjal University, Developer at Gramener.

Speaker Links:

Id: 863
Section: Data science
Type: Talks
Target Audience: Intermediate
Last Updated: