Building a search engine that queries a Probabilistic Graphical Model (PGM)
Rohan SV (~rohan2) |
A substantial number of search engineers in enterprises all over the world use conventional search platforms like Solr or ElasticSearch, as their search engine platforms. These systems allow for efficient search on documents where the underlying data is typically categorical.
However, these traditional platforms have limitations where:
- High-dimensional features need to be used for a vector based search
- Association mining based models need to be applied in the search
- Search results need to change in real-time based on feedback from users
A major responsibility of a search engineer in such enterprises is to tune the search engine for relevance in terms of the results. Typically, this requires a specialized knowledge of these systems. Moreover, there are limitations with respect to capability & flexibility while building such a system using off the shelf technologies.
In this talk, we will describe how we built a python based search engine that incorporates elements of machine learning and allows
Outline of the talk
About existing search systems
a. What do they do?
b. Disadvantages of these search engines
c. Why build your own custom search engine?
Building blocks of a custom real-time search engine using python
a. Indexing for a search system
Reverse indexing for categorical variables
b. Basic scoring & sampling methods
c. Vector matching based ranking
d. Custom scoring & ranking methods
Retrieving multiple associative documents in a single search operation is non-trivial
a. How can we represent learned associations as a probabilistic graphical model?
b. What form of an index allows efficient inference?
c. Inference of a probabilistic graphical model as a search
Apply association rules learned in real-time from user feedback using reinforcement learning principles within the search engine
Extending this framework to enable personalisation
Please find a link to the slides in the following location :
Basics of search systems
Vishwesh Kirthivasan is a Machine Learning Engineer at Mad Street Den. He works in areas of reinforcement learning using statistics & linear algebra. He works on building search systems that use high dimensional vectors that scale.
Vishwesh is also the bass guitarist at Motta Maadi Music, a space for jamming, for the past 1 year.