Speed up Pandas using Modin and Ray

surya dev (~surya6)




The pandas library provides easy-to-use data structures like pandas DataFrames as well as tools for data analysis. One issue with pandas is that it can be slow with large amounts of data. It wasn’t designed for analyzing 100 GB or 1 TB datasets. Fortunately, there is the Modin library which has benefits like the ability to scale your pandas workflows by changing one line of code and integration with the Python ecosystem and Ray clusters


Basic understanding of python and pandas

Speaker Info:

Surya is a senior data engineer in the Data and AI Team working at Körber , a market leader in AI-driven manufacturing & supply chain and number one innovation hub in Germany. I have multiple years of experience in Python and Machine learning experience, where I am now looking more into the data engineering side. Always looking to learn new things.

Section: Distributed Computing
Type: Talks
Target Audience: Intermediate
Last Updated: