Solving Logical Puzzles with Natural Language Processing
Ashutosh Trivedi (~codeAshu) |
Ever Imagined if your algorithm is able to solve the classic Odd Man Out problems for you, or solving Analogy problems? Well, thats not impossible provided you have large data. Consider how our brain solve these problems- If we have not seen the words asked in the puzzle we can not solve such problems. We need contextual information about the words. Also, if you notice carefully both the problems are somewhat similar and only require your memory or precisely how you store contextual information.
So is there any algorithm which can make it possible? This is what I am going to talk about. How can we build such contextual space of words. Google's Word2Vec is able to solve both of the problems above. To our pride there is an awesome implementation of it in python -- gensim. I will be talking about how to train such model and everything we have to keep in mind while dealing with it.
Word2Vec creates a model which you can easily query for some tasks. You can easily ask your model --
if man--> king then woman --> ?
it replies happily queen .
Lets go more optimistic and ask which of the one does not match ?
breakfast, cereal, dinner, lunch
As you wished it says cereal
To achieve this accuracy we need ample amount of data to build contextual space for each word. Google has released 100 billion words news dataset. I'll demonstrate the word2vec model trained upon it.
The algorithmic concepts would be covered from basics. Only the knowledge of Simple linear algebra, matrix multiplication is required
Presentation Link: https://drive.google.com/file/d/0BwceZGFFXIFrclB3MmtISEdOSUU/view?usp=sharing
Demo repository: https://github.com/codeAshu/pycon-India-2015-demos
Reference to bracketPy API functionality. This can give a heads up on what to expect from the session, Since I will be sharing the methods which allowed me to build it.
wor2vec paper by Google 
:Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean. Distributed Representations of Words and Phrases and their Compositionality. In Proceedings of NIPS, 2013
Ashutosh is a machine learning and NLP scientist, graduated from IIITB and founder of bracketPy (beta). His area of interest is NLP, Machine Learning and Distributed Computing. He has also been an active Spark open source contributor and speaker at Apache Spark Bangalore Meetup.