Identifying data blocks in CSV files using Image Processing with Python

Akhilesh Ravi (~akhilesh25)


Description:

Mammoth is a data management platform. We process any given data item into a Mammoth interpretable format that is hooked to a Postgres backend. Our current approach for processing files and identifying data blocks is Heuristic-based. This is time-consuming and not easily scalable to very large files. Our file processing system is undergoing a rehaul, where we are using image processing techniques to solve this problem. We are applying these techniques on non-image feature matrices that are extracted from the data item. On top of this, we are adding a pattern recognition layer, to improve the performance.

Outline of the talk

  1. Context - What is Mammoth? Current approach; Examples of files that could not be processed.
  2. Image processing - Draw similarities between non-image matrix and the image matrix - why choose image processing techniques for a non-image data?
  3. Feature selection and pattern recognition in the given feature matrices.
  4. Walk-through of the algorithm - The Image Processing approach - Feeding the images into a pattern recognition model - Using feedback from users to finetune the model for better performance (feedback loop).
  5. QnA

Who is this talk for?

Developers interested in image processing and pattern recognition


Key takeaways

Lateral thinking in applying AI techniques and concepts around image processing

Prerequisites:

Basic knowledge of image processing

Speaker Info:

Akhilesh Ravi

Akhilesh Ravi is a summer Intern at Mammoth Analytics. He is pursuing B.Tech. Electrical Engineering and Computer Science (dual major) at IIT Gandhinagar. He is interested in Machine Learning, Deep Learning, Image Processing, and Python.


Divya Sivaram

Divya Sivaram is an Engineer, graduated from the University of Sussex with Masters in Intelligent Systems. She is currently a Product Engineer at Mammoth.

Speaker Links:

Akhilesh Ravi


Divya Sivaram

Section: Data Science, Machine Learning and AI
Type: Talks
Target Audience: Intermediate
Last Updated: