Dask: Distributed Data Science in a pythonic way

smit thakkar (~smitthakkar96)




Dask is a general purpose parallel computing system capable of Celery-like task scheduling, Spark-like big data computing, and Numpy/Pandas/Scikit-learn level complex algorithms, written in Pure Python. Dask has been adopted by the PyData community as a Big Data solution. This talk focuses on the distributed task scheduler that powers Dask when running on a cluster. We will start by comparing Dask with the other solutions that are available for big data ETL and analytics . We will talk about how easily you can parallelize the work loads that you do with your favourite scipy libraries for eg Numpy, Pandas etc. Lastly we will also talk about how you can integrate Dask with your existing code and parallelize it's work load.


  • Good understanding of Python Programming
  • Must have used any scipy library before
  • Nice to have some idea regarding the big data tools available for analytics and ETL

Content URLs:


PS: First Draft, need to organize it better and improve the demos.

Speaker Info:

I am an enthusiastic developer and aspiring entrepreneur who holds a particular passion for the intersection of web development and emerging technologies. I am constantly exploring innovative ways to solve real world problems and improve existing solutions. I genuinely enjoy working with people, taking risks, and developing new applications.

I am currently working at Dubizzle as a Associate Software Engineer. Previously I worked at Corridor Funds as a Technology Architect where I built and Architected a data driven Loan valuation and Portfolio Management tool for retail and institutional lenders. I am open source contributor at Gluster, FOSS Asia, NGUI and GDG. Previously I lead a GDG Chapter in Gujarat.

I have also spoken at tech meet ups and conferences like Women techmakers, Google Devfest, Google Cloud Next Extended, Mozilla Gujarat, Local GDGs and Startup Gujarat. In addition to that, I am always experimenting with new and interesting side projects.

Speaker Links:

  • Github: http://github.com/smitthakkar96
  • Linkedin: http://linkedin.com/in/smitthakkar96

Id: 656
Section: Data science
Type: Talks
Target Audience: Intermediate
Last Updated: