Rock and roll with Dask(Scalable analytics in Python)

Suman Debnath (~debnsuma)


As we all know pandas is one of the most popular tool we all use in Python's data science tech stack, there will be times when you outgrow it. For example, if you need to leverage all 32 cores on your system rather than just 1 to get an answer much faster or if you have more data than will fit into your RAM, or even onto your disk. That's where Dask comes to the rescue. It has an API that is broadly compatible with pandas' but scales Python computation across cores, and across computers to bring you blazing fast analysis of data that exceeds what any single computer can handle. Dask is a free and open source library that helps scale your data science workflows and provides a complete framework for distributed computing in Python. In this session will get you up to speed with Dask and show you how to easily convert pandas workloads to blazing Dask clusters (locally across cores or scaled-out across cloud servers).


Basics of Python Basics of Numpy or Pandas

Speaker Info:

Suman Debnath is a Principal Developer Advocate at Amazon Web Services based in India. He tries to simplify the intricacies of AWS cloud services to developers, aids them to unravel its optimum possibilities and obtain its utmost usage into their application. His key focus areas are: Python, Machine learning, Data Analytics and Storage.

Speaker Links:

Section: Data Science, Machine Learning and AI
Type: Talks
Target Audience: Beginner
Last Updated: