Topic Modelling with Python
Parul Sethi (~parulsethi) |
Topic Modelling is a great way to analyse completely unstructured textual data - and with the python NLP framework Gensim, it's very easy to do this. The purpose of this tutorial is to guide one through the whole process of topic modelling - right from pre-processing the raw textual data, creating the topic models, evaluating the topic models, to visualising them. We will also see it’s applications in few NLP tasks: Discovering Topic correlation (with dendrograms), Document Clustering (demo with Tensorboard), Document analysis (using word coloring).
The python packages used during the tutorial will be spaCy (for pre-processing), gensim (for topic modelling), Visdom pyLDAvis and Plotly (for visualization). The interface for the tutorial will be a Jupyter notebook.
I'm a pythonista studying Maths and IT at University of Delhi. For the love of Open-source and NLP, I regularly contribute to a widely used Python library gensim and has also been selected as their GSoC(Google summer of code) student under NumFOCUS umbrella for 2017 (my live blog).