Building a Production Ready Search Engine using Python and Elasticsearch
Harshit Prasad (~harshit98) |
One of the common actions we perform, when we visit a website is search. Whether it’s an e-commerce website or video streaming platform, search always plays a major role. It takes thousands of hours of engineering efforts to get this done. Elasticsearch is one of the famous service built over the Apache Lucene (Open Source Search Engine written in Java) and powers up the search in numerous applications. It’s a real-time based distributed search system. Python is an excellent language to write a production-ready search engine using Elasticsearch in very less time.
In this talk, I will be talking about how production-ready search engines are developed in less time.
I will be covering use-cases of Python and Elasticsearch working together for indexing, retrieval of documents and documents scoring (or boosting). We will also discuss the common problems faced by engineers to keep the data sync between the SQL database and Elasticsearch.
- Problem Statement (2 min)
- Introduction to the problem statement.
- Introduction to Elasticsearch (3 min)
- Basic terminology and it’s working.
- Strategies for Syncing Data (2 min + 1 min code)
- Keeping Elasticsearch in sync with the database.
- Document Indexing (2 min topic + 3 min code)
- Pythonic code performing data indexing to Elasticsearch.
- Role of Analyzers and Tokenizers (3 min topic + 2 min code)
- Explanation about how they can be useful in the search engine.
- Code walkthrough.
- Scoring and Search Results Retrieval (2 min topic + 5 min code)
- Explanation about how scoring can improve searching results.
- Code walkthrough.
- Q/A Session (5 min)
Basic knowledge of Elasticsearch is good. However, I will be sharing a brief overview on the Elasticsearch architecture in a short time.
Intermediate Python level experience is good.
Link of Presentation (work in progress) - shorturl.at/owNY6
I’m a Software Engineering Intern at Grofers - India’s largest online grocery shopping platform. I’m an avid programmer who is passionate about code, design and technology. I’m an open-source contributor and worked with organizations such as HackerRank, CERN in the past.
I’ve been a Google Summer of Code student two times in 2017 and 2018.
When I'm away from work - I like to play badminton, write blogs, help people on StackOverflow and I love travelling and photography.
GitHub - https://github.com/harshit98
Linkedin - https://www.linkedin.com/in/harshit-prasad/
Twitter - https://twitter.com/HarshitPrasad8
Blog - https://email@example.com