Auto-detection of tags and text classification on unstructured data with python
Swathi Tatavarthy (~swathi14) |
With the substantial increase of data every day, manual labeling of data has become a tedious task. This gave a path for text analysis to be an emerging field of study. Platforms such as e-commerce, social media, news agencies are already leveraging the process of analyzing and extracting the textual information from different types of data. Text Classification is one of the essential parts of text analysis. In general text categorization is used to generate the tags from the unstructured data and label them into predefined categories.
This kind of approach can be applied in many contexts, ranging from document filtering to automated metadata generation, word sense disambiguation, processing of OCR data and in any application that requires an efficient organization of documents. It improves the search efficiency and retrieves the results in a fraction of seconds.
Auto-detection of tags can be done using python libraries gensim and nltk. With 5-6 lines of Python code, we can generate the tags for a given document and classify them into domains using wordnet hypernyms.
- Basics of Python
- Basic understanding on Natural Language Processing
I am a Software Engineer and a Python enthusiast. With the advent of machine learning and AI, I was fascinated about the insights generated from the data. The Rich libraries support of python for Machine Learning has given me more interest to dig deeper into it. I have been part of building various AI products using NLP and Predictive analytics on Cloud Platforms.This poster focuses on the Natural language processing of textual data using python libraries.