Training a sense2vec model

kapoorabhish


11

Votes

Description:

This workshop will be more detailed version of our previous workshop in PyData - 2017 https://pydata.org/delhi2017/schedule/presentation/38/

Sense2vec - Neural word representations have proven useful in Natural Language Processing (NLP) tasks due to their ability to efficiently model complex semantic and syntactic word relationships. However, most techniques model only one representation per word, despite the fact that a single word can have multiple meanings or "senses". Some techniques model words by using multiple vectors that are clustered based on context. However, recent neural approaches rarely focus on the application to a consuming NLP algorithm. Furthermore, the training process of recent word-sense models is expensive relative to single-sense embedding processes. Sense2vec paper presents a novel approach which addresses these concerns by modeling multiple embeddings for each word based on supervised disambiguation, which provides a fast and accurate way for a consuming NLP model to select a sense-disambiguated embedding.

Source - Cornell University Library

Word2vec - Word2vec is a group of related models that are used to produce word embeddings. These models are shallow, two-layer neural networks that are trained to reconstruct linguistic contexts of words. Word2vec takes as its input a large corpus of text and produces a vector space, typically of several hundred dimensions, with each unique word in the corpus being assigned a corresponding vector in the space. Word vectors are positioned in the vector space such that words that share common contexts in the corpus are located in close proximity to one another in the space.

Source - Wikipedia

Flask - Flask is a microframework for Python based on Werkzeug, Jinja 2 and good intentions. And before you ask: It's BSD licensed!

Source - Flask

Workshop Structure

  • Attendees will be provided the corpus of news scraped from web containing approx 500,000 articles

  • Tutorial on how we can use Spacy for POS tagging and use Noun chunks provided by it to feed to Gensim Word2vec

  • Tutorial on how to use Gensim to create a Word2vec model

  • Tutorial on how to convert Word2vec model to Sense2vec model

  • Writing REST service in Flask to get the similarity results using Sense2vec

  • Integrate REST service with the front-end

Prerequisites:

Speaker Links:

Tanu Mittal (Sr. Software Engineer) https://www.linkedin.com/in/tanu-mittal-16b12364/

Abhishek Kapoor (Software Engineer) https://www.linkedin.com/in/abhishek-kapoor-4b7b9295

Section: Data Analysis and Visualization
Type: Workshops
Target Audience: Intermediate
Last Updated: