Linguistics 101 - Natural Language Processing in Python with NLTK

Namit Juneja (~namitjuneja)




A look at how Natural Language Processing(NLP) can be used to identify meaningful information in a text.

We will begin by breifly overviwing the features of the NLTK library, and then we will try to develop a language aware data product using a a topic identification and document clustering algorithm from a web crawl of blog sites. The clustering algorithm will use a simple Lesk K-Means clustering to start, and then will improve with an LDA analysis.

The NLTK library exists with a pre-existing set of definitions and a means of work flow. By studying this we will get a feel for the various features and functionality that NLTK has to offer. But since everyone has very diversified requirements, therefore during the later part of the session we will build a topic identification and document clustering algorithm from a web crawl of blog sites using NLTK in a codelab.

This indeed is a very interesting field in data science and surely something enjoyable to learn.


Basic Python Programming Skills

Technical: A laptop with the development environments set up. (Python and NLTK)

Content URLs:

Speaker Info:

Namit Juneja

Under Graduate at VIT University, Vellore.

He has worked extensively with startups at Shanghai, San Francisco, New Delhi etc. in the field of Data Analytics, Natural Language Processing and HCI.

He is an active contibutor in research projects in the field of data science and HCI at Stanford University.

Has won various hackathons by building products based on natural lanuage processing, natural language generation and image processing. ( ; ;

Regularly participates and conducts talk sessions at Google Developer Groups at Bangaluru and VIT Vellore ( ).

Loves teaching new technologies to tech enthusiasts.

Currently puruing B.Tech and collaborating with CraftCloud ( a disruptive image acquisition startup at New York) to develop an intelligent image processing platform.

Speaker Links:


Section: Data Visualization and Analytics
Type: Workshops
Target Audience: Intermediate
Last Updated: