Training a sense2vec model
This workshop will be more detailed version of our previous workshop in PyData - 2017 https://pydata.org/delhi2017/schedule/presentation/38/
Sense2vec - Neural word representations have proven useful in Natural Language Processing (NLP) tasks due to their ability to efficiently model complex semantic and syntactic word relationships. However, most techniques model only one representation per word, despite the fact that a single word can have multiple meanings or "senses". Some techniques model words by using multiple vectors that are clustered based on context. However, recent neural approaches rarely focus on the application to a consuming NLP algorithm. Furthermore, the training process of recent word-sense models is expensive relative to single-sense embedding processes. Sense2vec paper presents a novel approach which addresses these concerns by modeling multiple embeddings for each word based on supervised disambiguation, which provides a fast and accurate way for a consuming NLP model to select a sense-disambiguated embedding.
Source - Cornell University Library
Word2vec - Word2vec is a group of related models that are used to produce word embeddings. These models are shallow, two-layer neural networks that are trained to reconstruct linguistic contexts of words. Word2vec takes as its input a large corpus of text and produces a vector space, typically of several hundred dimensions, with each unique word in the corpus being assigned a corresponding vector in the space. Word vectors are positioned in the vector space such that words that share common contexts in the corpus are located in close proximity to one another in the space.
Source - Wikipedia
Flask - Flask is a microframework for Python based on Werkzeug, Jinja 2 and good intentions. And before you ask: It's BSD licensed!
Source - Flask
Attendees will be provided the corpus of news scraped from web containing approx 500,000 articles
Tutorial on how we can use Spacy for POS tagging and use Noun chunks provided by it to feed to Gensim Word2vec
Tutorial on how to use Gensim to create a Word2vec model
Tutorial on how to convert Word2vec model to Sense2vec model
Writing REST service in Flask to get the similarity results using Sense2vec
Integrate REST service with the front-end
- Laptop with at least 8 GB of RAM (OS used for workshop - Ubuntu)
- Python 3 environment.
- Github repo - https://github.com/kapoorabhish/sense2vec_workshop
- Slides available at http://bit.ly/2w5pwN8
Tanu Mittal (Sr. Software Engineer) https://www.linkedin.com/in/tanu-mittal-16b12364/
Abhishek Kapoor (Software Engineer) https://www.linkedin.com/in/abhishek-kapoor-4b7b9295