Text Analysis using SpaCy -With Real-time Use Case- Resume Scoring in Python

sakthi vel (~sakthi06)



Every Software company look for Talented Resource to recruit in which it depends on Talent Acquisition(TA) team. Whereas TA team faces lot of challenges in finding the best fit candidate for particular Job requisition.

From Job Description -

  • What are the skills are required for this requisition?
  • Do we have the matching candidate profiles for this requisition in our bunch of Resumes?
  • When we have the suitable resumes,
  • Can we have any order to call the candidates (Scoring!) for interview?
  • Who is the best fit candidate?

In this tutorial we will demonstrate how text parsing can be implemented using spaCy without having any deep learning experience

What is spaCy:

spaCy which is a popular and easy-to-use natural language processing library in Python. It provides current state-of-the-art accuracy and speed levels, and has an active open source community. However, since SpaCy is a relative new NLP library, and it’s not as widely adopted as NLTK.

We will cover how Information Retrieval (IR) called Named Entity Recognition and how we can apply it for automatically generating summaries of resumes by extracting only important entities like Candidate name, Technical/Non- Technical skills, Education, Experience, Phone Number, Email ID etc from Resumes and we match the against the JD to derive the scoring.

spaCy NER Model

spaCy provides an exceptionally efficient statistical system for named entity recognition in python, which can assign labels to groups of tokens which are contiguous.

High- level Outline

Basics of Text Classification using spaCy

  • Installing spaCy
  • Tokenizing the Text
  • Cleaning Text Data
  • Removing Stopwords from Our Data
  • Lemmatization
  • Part of Speech (POS) Tagging
  • Entity Detection
  • Dependency Parsing
  • Word Vector Representation

Usecase Topics

  • Train the Model - NER
  • Resume & JD Parsing
  • JD Matching & Scoring


Participants should have working knowledge in Python, should understand the basic principles of machine learning, and should have at least basic experience with NLP Concepts. However knowledge of advanced text analysis & classification is not required.

Content URLs:


Speaker Info:

Senior BI Analyst with experience of 8 years in various IT technologies. Being in BI Data Analytics, ETL & Data Science for almost 8 years and had a chance to work in various areas like Development, Research, and Business Analytics. Strong knowledge in Data warehousing, ETL, Unix, Statistical Techniques, Machine Learning, NLP, Deep Learning, and Reinforcement Learning as well practical exposure.

On a career span of 8 years, had an opportunity to work in areas like Data Warehousing, Data Science, Middleware’s, Full stack, Micro services, Data Virtualization and Blockchain.

Current engagement in Altimetrik includes Data warehousing, Data Science, Artificial Intelligence, Full Stack Development.

Speaker Links:

LinkedIn - www.linkedin.com/in/sakthiv Github - https://github.com/vs-sakthi Blog - http://sakthiemcee.blogspot.com/

Id: 1304
Section: Data Science, Machine Learning and AI
Type: Workshop
Target Audience: Intermediate
Last Updated: