Polyglot data with python: Introducing Pandas and Apache Arrow
Robson Júnior (~bsao) |
Nowadays Python is synonymous of data, but not necessarily the best choice for some data tasks. For example, exchange data between different ecosystems is one of the challenges for Python. Pandas and NumPy are very efficient and de facto tools to deal with a reasonable amount of data with performance, but they are limited outside of the Python ecosystem. Acquire and exchange data might be painful due to the problem to write slow conversion code or generated unnecessary large files to talk with other ecosystems, likes large CSV files. Apache Arrow playing with Pandas is a great option as technologies that handle these problems with an excellent performance playing natively with Python. This talk aims to show how to work in a heterogeneous environment with data coming from another ecosystem, be handled inside the Python ecosystem and send back to another ecosystem transparently.
Robson is a developer deeply involved with software communities, especially the Python community. I've been organizing conferences and meetups since 2011 and effectively speaking in conferences since 2012 about python and cloud technologies and since 2016 about data-related technologies. Also as an Independent consultant, I conduct on-demand architecture consultancy and training sessions about data-related technologies.