Identifying data blocks in CSV files using Image Processing with Python
Akhilesh Ravi (~akhilesh25) |
Mammoth is a data management platform. We process any given data item into a Mammoth interpretable format that is hooked to a Postgres backend. Our current approach for processing files and identifying data blocks is Heuristic-based. This is time-consuming and not easily scalable to very large files. Our file processing system is undergoing a rehaul, where we are using image processing techniques to solve this problem. We are applying these techniques on non-image feature matrices that are extracted from the data item. On top of this, we are adding a pattern recognition layer, to improve the performance.
Outline of the talk
- Context - What is Mammoth? Current approach; Examples of files that could not be processed.
- Image processing - Draw similarities between non-image matrix and the image matrix - why choose image processing techniques for a non-image data?
- Feature selection and pattern recognition in the given feature matrices.
- Walk-through of the algorithm - The Image Processing approach - Feeding the images into a pattern recognition model - Using feedback from users to finetune the model for better performance (feedback loop).
Who is this talk for?
Developers interested in image processing and pattern recognition
Lateral thinking in applying AI techniques and concepts around image processing
Basic knowledge of image processing
Akhilesh Ravi is a summer Intern at Mammoth Analytics. He is pursuing B.Tech. Electrical Engineering and Computer Science (dual major) at IIT Gandhinagar. He is interested in Machine Learning, Deep Learning, Image Processing, and Python.
Divya Sivaram is an Engineer, graduated from the University of Sussex with Masters in Intelligent Systems. She is currently a Product Engineer at Mammoth.