Speed up Pandas using Modin and Ray
surya dev (~surya6) |
Description:
The pandas library provides easy-to-use data structures like pandas DataFrames as well as tools for data analysis. One issue with pandas is that it can be slow with large amounts of data. It wasn’t designed for analyzing 100 GB or 1 TB datasets. Fortunately, there is the Modin library which has benefits like the ability to scale your pandas workflows by changing one line of code and integration with the Python ecosystem and Ray clusters
Prerequisites:
Basic understanding of python and pandas
Speaker Info:
Surya is a senior data engineer in the Data and AI Team working at Körber , a market leader in AI-driven manufacturing & supply chain and number one innovation hub in Germany. I have multiple years of experience in Python and Machine learning experience, where I am now looking more into the data engineering side. Always looking to learn new things.