Document Clustering with Word2vec and Hierarchial Clusters

Karmanya Aggarwal (~CalmDownKarm)


4

Votes

Description:

Overall the talk is going to be about topic modelling, however, I'd like to talk about 2 things in particular

  1. Performing LDA onto a dataset, extracting most popular themes and then using word2vec and clustering to agglomerate the themes into clusters. Using Hierarchical Clustering to fit the themes into a fixed number of labels. Similar to what google's NLP classification API attempts to do.

  2. Visualizing Clusters of words/sentences/phrases using Dendrograms and t-SNE

Finally, if I get time I'd like to talk about StitchFix's LDA2vec approach, but I think the first 2 will last 30 minutes unless the audience is very familiar with how this sort of stuff works.

Prerequisites:

Some familiarity with clustering (Kmeans) is helpful, but not required.

Content URLs:

http://www.calmdownkarm.com/2018/clustering (Blog Post) https://github.com/CalmDownKarm/360classification

Speaker Info:

Recently graduated from BML Munjal University, Developer at Gramener.

Speaker Links:

calmdownkarm.com

Section: Data science
Type: Talks
Target Audience: Intermediate
Last Updated: