Creating a recommendation engine based on NLP and contextual word embeddings

Manas Ranjan Kar (~manasRK) | 30 Jun, 2016

107

Votes

Description:

How can we create a recommendation engine that is based both on user browsing history and product reviews? Can I create recommendations purely based on the 'intent' and 'context' of the search? How do I use natural language processing techniques to create valid recommendations?

This talk will showcase how a recommendation engine can be built with user browser history and user-generated reviews using a state-of-the-art technique - word2vec. We will create something that not only matches the existing recommender systems deployed by websites, but goes one step ahead - incorporating context to generate valid and innovative recommendations. The beauty of such a framework is that not only does it support online learning, but is also sensitive to minor changes in user tone and behavior.

The trick/secret sauce is - How do we account for the 'context' and build it in our systems? The talk will answer these questions and showcase effectiveness of such a recommender system.

MOTIVATIONS & PRACTICAL APPLICATIONS:

The current recommender systems tend to misfire when user history is not known or new products are introduced into the mix. Missing user ratings add more complexity and create hindrances to relevant recommendations. Also, the current websites, especially in the domain of food or travel, don't allow me to do a contextual search, like "best chicken tikkas in South Delhi". The results depend on the keywords appearing in the title or description, but rarely in the reviews + user browsing history.

We have built two models using word2vec, which are;

META MODEL: Created with the user browsing history. This contains more than 9.4 million product histories. The attempt is to mimic and improve upon existing collaborative filtering systems.

USER REVIEW MODEL: Currently websites don’t allow us to search on “context”. This model intakes reviews and attempts to create a framework for a “contextual search engine”.

This is our attempt to propose and demonstrate a framework that’s more rounded and preserves context while generating recommendations. The current architecture looks something like this;

Word2vec recommender architecture

The top results have high affinity among each other, and occur in Amazon’s website itself in the “also bough/also viewed” section. The precison at 3 results (P@3) is 58% currently. Howevever, P@15 is at 100%. We are currently improving the system and working on pre-processing techniques.

Prerequisites:

The participants must be well versed with Python and have a basic understanding of natural language processing, Codes and required documentation will be provided post the session.

Content URLs:

Code: https://github.com/manasRK/word2vec-recommender

The code is messy, will be cleaning the code.

Slides link: https://docs.google.com/presentation/d/1D4kdRbpHIZJ6YJc0huCRjiipNImC3rZxKSUdP_uub2U/edit?usp=sharing

Speaker Info:

Manas is currently leading the text analytics practice at Juxt Smart Mandate, a data science company. He likes helping clients making sense of their data and build a powerful case for business change using analytics in their respective companies.

He has architected multiple commercial NLP solutions in the area of healthcare, foods & beverages, finance and retail. He is deeply involved in functionally architecting large scale business process automation & deep insights from structured & unstructured data using Natural Language Processing & Machine Learning.

To sum up his experience, he has worked on;

Application of machine learning to build text analytics solutions
Automate business processes for efficiency & productivity
Build algorithms for extracting multiple facets from text - gender of author, keywords, sentiment, taxonomies, concepts, entities
Combine and augment unstructured insights with structured data
Build recommendation engine for automated medical coding services
Build models to predict taxonomies for textual content
Create machine learning algorithms for topic detection & sentiments
Competitive intelligence algorithms to monitor events & trends for startups & SMEs

His detailed LinkedIn profile is https://in.linkedin.com/in/manasranjankar .

Manas has contributed to multiple NLP libraries like Gensim and Conceptnet5. He blogs regularly on NLP on multiple forums like Data Science Central, LinkedIn and his blog Unlock Text. He is currently ranked 1035th on Kaggle amon more than half a million Kaggler in the world. He loves teaching and mentoring students. He speaks regularly on NLP and analytics at national conferences, guest talks at IIM Lucknow and MDI Gurgaon. He has also mentored students from schools like ISB Hyderabad, BITS Pilani, Madras School of Economics.

Akhil Gupta is currently in 4th year B-tech at SRM University, and currently working as a software developer at SRM Search Engine, a government funded project. He has a 2+ years of experience in data science with major expertise in data mining, text analysis, social media analysis, back-end architectures and data mining.

He also worked in areas of;

Natural language processing
Classification algorithm
Topic modelling
Clustering using probabilistic models
Twitter Mining.

He likes to build software which in some way eases human effort, some of them are;

• Content based semantic image retrieval • Languge model; having features such as Autocomplete, Entity tagger, Spell check, Word segmenter etc. • Entity Tagger; made on wikipedia dataset for tagging entity as well as domain identification. • Topic modelling • Restaurant recommendation engine; on the basis of food items. • Adjective and pronoun coreference resolution.

His detailed linkedin profile is https://in.linkedin.com/in/akhilgupta0910 .

Speaker Links:

Manas Ranjan Kar

LinkedIn : https://in.linkedin.com/in/manasranjankar
Contribution to Gensim (PR #625): https://github.com/RaRe-Technologies/gensim/blob/develop/gensim/scripts/glove2word2vec.py
Blog: http://unlocktext.com/
Related Blog Article: http://unlocktext.com/index.php/2015/12/14/using-glove-vectors-in-gensim/
Context oriented NLP: https://www.linkedin.com/pulse/context-extraction-better-sentiment-analysis-manas-ranjan-kar?trk=prof-post
Analysing product reviews for context cues: http://www.datasciencecentral.com/profiles/blogs/impactful-text-analytics-for-smarter-businesses

Akhil Gupta

LinkedIn : https://in.linkedin.com/in/akhilgupta0910
Github : https://github.com/codeorbit
Contribution to SRMSE : https://github.com/SRMSE
Twitter : https://twitter.com/decoding_life

Section:	Data Visualization and Analytics
Type:	Talks
Target Audience:	Intermediate
Last Updated:	06 Aug, 2016

Comments