OCR in Indic Scripts - An Introduction to utilizing CTC Loss in Indian Scripts

Abhilash Pal (~abhilash57)


Description:

Designing an Optical Character Recognition Engine in Deep Learning using Seq2Seq models includes a threefold task. Firstly, segmenting the image into either line level or word level, then encoding the text pertaining to that line or word into vectors and finally, training the model using a CNN-RNN-CTC system.

While it is easy enough to do for the latin script(English), doing so for Bangla, Tamil or Gurmukhi can prove to be a problem, mainly because of how these languages are encoded in the Unicode System. OCRs for Indian Scripts thus entail finding a proper way to encode the individual character and affixes before venturing into the Model.

Furthermore, training such a model is a bit tougher than training one for Latin, mainly because of the high amount of characters and affixes these languages contain. This Poster aims to draw insight into this extremely important but relatively neglected side of Deep Learning Research into indigenous script recognition.

It will cover the following points :

  • DataSet Generation using Python
  • Document Segmentation
  • Encoding Indian Languages(with exemplification in Bangla and possibly Tamil)
  • Model Architecture and Training
  • CTC Loss Module

The CTC Loss part will be elaborated well for anyone willing to know more about how the loss layer functions in OCRs and Speech Recognition Systems(as both use a similar CTC Loss Layer). The Poster will conclude with a summarization and how further work can be undertaken in the said domain.

Prerequisites:

The main prerequisites for my presentation would be

  • A Basic Understanding of Python Syntax and Machine Learning
  • Familiarity with Deep Learning Models, Particularly Seq2Seq models.
  • Interest in the fields of Computer Vision and Sequence Modeling.
  • An Understanding of Keras, Numpy and OpenCV will be of help too.

Content URLs:

Slides on Which Poster will be based : https://www.slideshare.net/secret/AJmak8W2r4pv4U

Github Repo for Code : https://github.com/AbhilashPal/BanglaOCR

Video Proposal : https://youtu.be/WLXmGt2Ziak

Speaker Info:

Currently a Final Year UG student at SRM University, Chennai. My interests include Artificial Intelligence, Machine Learning, Deep Learning, and NLP. Formerly, leading the DS&ML Domain of Club Gen-Y. Have completed various short projects in machine learning/NLP as part of various hackathons and won a couple of them.

I worked with Dr. Utpal Garain at ISI Kolkata during my summer break in June 2019 on an OCR for Bangla Script. My presentation will draw upon key insights I learnt when I worked on the said problem, exemplifying for others what difficulties they might face when working on OCRs for their own mother tongue.

More details can be found on my linkedin : https://www.linkedin.com/in/abhilashpal/

Speaker Links:

Personal Site : http://abhilashpal.github.io

Github : https://github.com/AbhilashPal

Linkedin : https://www.linkedin.com/in/abhilashpal/

Blog : https://medium.com/@abhilashpal8

Section: Data Science, Machine Learning and AI
Type: Poster
Target Audience: Intermediate
Last Updated: