IndicNLP: Natural Language Processing for Indian Languages using Python

Gajendra Deshpande (~gcdeshpande)


Description:

India is a diverse country with 22 major languages written in 13 different scripts with over 720 dialects. Indian languages are complex and bring about unique challenges for NLP practitioners. In addition, many Indians are multilingual and commonly mix words from different languages. Natural Language Processing for English is a much-matured area and many breakthroughs have been achieved. However, when we think of Natural Language Processing for Indian languages there are many opportunities to explore for Researchers and Developers. In this workshop, we will perform Natural Language Processing tasks on a few of the Indian languages including cross-language processing, transliterated text, and translated text. Then, I will demonstrate how to build an NLP application to process Indian languages. Finally, we will conclude the workshop by exploring the future scope for NLP for Indic languages.

Outline of the workshop

  1. Introduction to NLP and example applications (05 Minutes)
  2. Introduction to IndicNLP: opportunities and challenges (05 Minutes)
  3. Python packages for NLP for Indian Languages (05 Minutes)
  4. Setting up the environment for the workshop (05 Minutes)
  5. Tokenization (05 Minutes)
  6. Word embeddings (05 Minutes)
    BREAK (05 Minutes)
  7. Text completion (05 Minutes)
  8. Similarity of Sentences (05 Minutes)
  9. Normalization (05 Minutes)
  10. Transliteration (05 Minutes)
  11. Phonetic analysis (05 Minutes)
  12. Syllabification (05 Minutes)
    BREAK (05 Minutes)
  13. Lemmatization (05 Minutes)
  14. Part of Speech tagging (05 Minutes)
  15. Translation (05 Minutes)
  16. Dependency parsing (05 Minutes)
  17. Demonstration of IndicCorp: processing corpus text (10 Minutes)
    BREAK (05 Minutes)
  18. Demonstration of IndicFT: subword aware word embedding model (10 Minutes)
  19. Demonstration of IndicBERT (10 Minutes)
  20. Demonstration of IndicGlue (10 Minutes)
  21. Demonstration of Polyglot (10 Minutes)

Intended audience

Anyone who is interested to perform Natural Language Processing tasks, especially with Indian languages, and explore the opportunities. This workshop will be beginner-friendly.

Why should someone attend this workshop? What will they get at the end of it?

This is a unique opportunity for someone who wants to perform natural language processing for Indian languages. The participants will learn interesting topics like performing NLP for their native written and grammatical language. In addition, participants will learn to perform cross-lingual NLP tasks, transliterated text, and translated text. Participants will learn three python packages iNLTK, IndicNLP, Stanza (previously StanfordNLP) and Polyglot.

Prerequisites:

Basic knowledge of Python sufficient. Basic understanding of Natural Language Processing concepts will be advantageous.

Video URL:

https://www.youtube.com/watch?v=yr-h9XGyk9o

Content URLs:

  1. Slides URL: https://drive.google.com/file/d/1RiKv-zhsHhUuMi6VCP9iaNFOVBida14N/view?usp=sharing

  2. The example codes will be made available to the participants at the below repository https://github.com/gcdeshpande/IndicNLP

  3. Setup Instructions https://github.com/gcdeshpande/IndicNLP/blob/main/README.md

  4. All the participants will use GitHub and https://repl.it to execute the programs. Alternatively, participants can setup the environment and execute the programs locally Otherwise Kaggle Notebooks.

Speaker Info:

I hold M.Tech. in Computer Science and Engineering and PG Diploma in Cyber Law and Cyber Forensics from National Law School of India University, Bengaluru India. I have presented talks/posters/papers at prestigious conferences including JuliaCon, London, PyCon France, PyCon Hong Kong, PyCon Taiwan, COSCUP Taiwan, PyCon Africa, BuzzConf Argentina, EuroPython, PiterPy Russia, SciPy USA, SciPy India, NIT Goa, and IIT Gandhi Nagar. Worked as a Reviewer and Program Committee member for reputed International conferences including SciPy USA, SciPy Japan, JuliaCon, JupyterCon, PyData Global, and PyCon India, and publishers include Manning USA and Oxford Univesity Press. I am also a GitHub Certified Campus Advisor. I lead the PyData Belagavi chapter and the OWASP Belagavi chapter.

Speaker Links:

My Personal web page https://gcdeshpande.github.io/

GitHub https://github.com/gcdeshpande

Google Scholar https://scholar.google.co.in/citations?user=yl0uzFsAAAAJ&hl=en

Section: Data Science, Machine Learning and AI
Type: Workshop
Target Audience: Beginner
Last Updated: