Document Clustering with Word2vec and Hierarchial Clusters
Karmanya Aggarwal (~CalmDownKarm) |
Overall the talk is going to be about topic modelling, however, I'd like to talk about 2 things in particular
Performing LDA onto a dataset, extracting most popular themes and then using word2vec and clustering to agglomerate the themes into clusters. Using Hierarchical Clustering to fit the themes into a fixed number of labels. Similar to what google's NLP classification API attempts to do.
Visualizing Clusters of words/sentences/phrases using Dendrograms and t-SNE
Finally, if I get time I'd like to talk about StitchFix's LDA2vec approach, but I think the first 2 will last 30 minutes unless the audience is very familiar with how this sort of stuff works.
Some familiarity with clustering (Kmeans) is helpful, but not required.
http://www.calmdownkarm.com/2018/clustering (Blog Post) https://github.com/CalmDownKarm/360classification
Recently graduated from BML Munjal University, Developer at Gramener.