OCR for Indian Documents

kunal kumar (~kunal6) | 22 Apr, 2024

12

Votes

Description:

Hello People,

The last few months of research for the Identity documents of India have led to the fact that automation to these documents has been limited to a small scope. Although the available tools for OCRing any documents are ample in number, the results are not satisfactory.

Most government agencies have been using high-cost limited tools for their operations which do not serve to be economic and very often unreliable as well. Most of the data extracted for these agencies are directed towards the Adhaar and PAN of an individual. Although, it might come as a surprise that the Indian government assigns 15+ identity documents to a citizen during their cycle of birth to death. Most of the FinTech and other large organisations are getting this done as grunt work manually by their team. Extracting and finding accuracy with the data for most of these documents isn’t simple enough to be automated and efficiently used yet. It would be important to make note of the fact that technical advancements have been made to some extent. There are APIs to extract data from Aadhar documents, however they either prove to be costly or time consuming or both.

From what the results have shown us so far, Open-source seems to be the only likely answer. There are promising ML/AI models to be used at our disposal. These models if researched and put to use by a group of individuals can significantly improve the quality of data that can extracted from any of these documents.

Content URLs:

https://slides.com/kunalkumar-1/ocr-indian-docs

Speaker Info:

Kunal Kumar Kushwaha
engineering lead @ essentia.dev
7+ years of experience

Speaker Links:

https://github.com/nerdyk3
https://erkunal.in/
https://www.linkedin.com/in/coderk3
email: kunal@essentia.dev

Section:	Core Python
Type:	Talk
Target Audience:	Intermediate
Last Updated:	05 Jun, 2024

Comments