Big Data Analytics Using Apache Spark On IOT in Industrial way





Data science, also known as data-driven science, is an interdisciplinary field about scientific methods, processes, and systems to extract knowledge or insights from data in various forms, either structured or unstructured, similar to data mining,this talk aims to provide you the whole steps of data science with Spark from the beginning till end . Apache Spark provides developers with an API(application programming interface) which is centered on a data structure which is called the resilient distributed dataset or RDD, it is a read-only set of many data items which is distributed over a large cluster of machines, that is organized in a fault-tolerant way. It has removed the limitations in the Map Reduce cluster computing programming paradigm, which forces a particular one line flow of data structure on distributed programs. In Map Reduce program get input data from hard disk, map a function across the input data, it reduces the result of various map, and store reduction results on disk. Under the map reduce model, the primitives of data processing are known as mappers and other set is called reducers. Data processing application are decomposed into mappers and reducers is sometimes not trivial. But, if we code an application in the form of map reduce, scaling the application to run over lets say tens thousands of machines in a cloud or cluster is like a configuration change. What this talk covers:

  • Basic Model of Spark

    • Spark Driver and Workers
    • Resilient Distributed Datasets
    • Why performance is faster
  • Spark with IOT

    • Using Spark in an IoT Analytics Platform
    • Analysis of IoT Device JSON Data
  • Ending notes and strong warnings


The viewer is expected to know about basic Python . A little idea about Big Data, Data Analytic and IOT would be helpful.

Content URLs:

PPT Slide link for these Talk is here :-

Speaker Info:

Shubham Sharma is currently working on Spark and hadoop .He is currently working as associate software engineer at certaintyinfotech . He is working on Big Data and Analytics using Python , Pandas , Anaconda , Spark etc .

He has strong roots in computer science and technology and believes that it is the only thing that can create this world a better place but is also quite interested in Mobile Application Security, Hadoop Security, and Web development as they are an integral part of our lives.

Section: Data Analysis and Visualization
Type: Talks
Target Audience: Intermediate
Last Updated:

The comment is marked as spam.

Pradhvan Bisht (~cyber_freak)

PPT Slide link for these Talk is here :-


Shubham is a great Python programmer and his talk will surely add much value to PyCon India 2017. Plus the topic is a hot one and Gartner estimates that IoT implementations will generate huge and unmanageable amount of data by 2025.

Er.Purnendu Prabhat 'Mukul' (~er.purnendu)

Shubham is a great Python Developer, and the topic is related to Iot which is at boom right now, and will add value to Pycon.

Ashmeet Chhabra (~ashmeet)

Shubham is a great Python Developer, and the topic is related to Iot which is at boom right now, and will add value to Pycon.

Ashmeet Chhabra (~ashmeet)

Login to add a new comment.