Large scale data processing with Map Reduce and AWS Lambda

Shalabh Aggarwal (~shalabhaggarwal)


Data processing is one of the most important pillars of today's software engineering industry. Over that, large scale data processing is one of the complex problems to solve, that too in an optimised and cost effective manner. Multitudes of tools are available but architecting all the tools in a proper way to ensure all the components scale (and descale) efficiently is the focus of today's engineers working with data processing. Python is one of the preferred language while writing data processing algorithms, hence in this workshop the focus would be on the following:

  • Use Python to write data processing programs
  • Basic level overview and hands on experience about libraries like pandas.
  • Learn about what map reduce is and the concept behind the same.
  • Understand the AWS services that can be leveraged for the same.
  • Architect a completely automated pipeline for data processing.

At the end of this workshop, attendees should be able to create EMR clusters on AWS and write programs about processing data of any scale with the same. Best things is that they will be able to automate the whole thing and watch processing happen while sipping coffee.


  • Laptop
  • Python 3 installed
  • An IDE
  • Understanding of Python basics
  • An AWS account is preferred to be able to do handson with AWS services.

Content URLs:

Speaker Info:

I have around 10 years of experience in developing business systems, mobile and web applications for small-to-large scale industries. I started career with Python, and although I work on multiple technologies, I remain a Python developer at heart. I am passionate about open source technologies and write highly readable and quality code. Also I am the author of Flask Framework Cookbook which covers various aspects of developing web based applications with focus on Python based Flask framework.

Speaker Links:

  • Flask Framework Cookbook -
    • Second edition will be published before Pycon 2019
  • Blogs -
    • 13 blog posts here
  • Github -
    • Many repos are just forks
    • For some, I have made small contributions in bug fixes or small additional features
    • For a few, I am one of the initial core contributor.
    • Some are my own which can be repos for my blogs or book or just hobby.

Id: 1256
Section: Data Science, Machine Learning and AI
Type: Workshop
Target Audience: Advanced
Last Updated: