Out-of-core Analytics Using Python
Gopi Suvanam (~gopi32) |
Most analytics packages in Python store data and perform analysis/modeling in the memory. This puts a limitation on the use of packages for data that does not fit in memory. As a result several data scientists are forced to use non-Python packages (like Spark etc.) for performing analysis on larger data sets. We propose an architecture using column-stores on the disc and of out-of-core processing in Python to circumvent this problem. The talk will focus on how one can handle descriptive analytics and machine learning using a basic set of primitive calculations. The talk will also highlight implementation using MonetDB as the column-store. The analysis that will be covered includes: - Search - Aggregations - Filter - Generalized linear models - Decision trees
The talk will be interesting for data scientists who want to use Python to analyse and build models on large sets of structured data.
Type of Talk
The talk will be based on experience of building data products using Python. It will have practical learning as the key focus.
Machine learning, analytics, column-stores.
Gopi Suvanam is the co-founder of G-Square Solutions. G-Square provide analytics products built using python. Gopi leads the technology, delivery and product development for G-Square. He has more than 10 years of experience in financial analytics. He has worked for four years in Deutsche Bank in NY and London before venturing into entrepreneurship. He has an MBA from the Indian Institute of Management (IIM) Ahmedabad. He also has a Bachelor’s degree in Computer Science from IIT Madras. Gopi takes a keen interest in poetry, literature, number theory and international economics.