Python weds Elephant : An Introduction to Map Reduce Programming with Python
by Jaganadh Gopinadhan (speaking)
Objective
Objective of the proposed talk is to introduce Map Reduce programming for Hadoop in Python. The talk aims to give illustrative examples of Python based Hadoop and Map Reduce API's like Pydoop (http://pydoop.sourceforge.net/docs/) and MRjob (https://github.com/Yelp/mrjob). It aims to give an over all comparison of existing Pythonic Hadoop and Map Reduce APIS too.
Description
Hadoop TM is a platform which provides distributed data storage and computing capabilities. The Hadoop ecosystem comes with a bunch of tools for dealing with big-data problems. The programming model for Hadoop architecture is called as Map Reduce programming. The very concept of Map and reduce programming exists in functional programming languages; it there in Python too. When Hadoop became a trend in Big-Data processing bunch of Python interfaces or APIs were developed for Map Programming and replicating some of the features of Hadoop such as HDFS and Map Reduce. The 'disco' and 'dumbo' projects are example for the same. The Hadoop architecture has a native support for running Python map reduce programs through the Hadoop streaming functionality. The proposed talk discusses different tools and techniques for running Python programs in a Hadoop ecosystem.
Talk outline
1) Introduction to HDFS 2) Introduction to Map Reduce 3) Hadoop Streaming 4) MRJob and its features 5) Dumbo and Disco 6) Introduction to Pydoop 5) Wrting Java like Map and Reduce programs with Pydoop 6) Playing with Pydoop and HDFS 7) Use cases of Python in the Hadoop Ecosystem
Speaker bio
Jaganadh G is a Text Analytics / Mining Researcher and Developer. His areas of interest are Text Mining / Analytics, Natural Language Processing, Machine Learning, Sentiment Analysis, Big-Date, Hadoop and Allied Technologies, NoSQL and Free and Open Source Software. He holds post graduate degree in Sanskrit Nyaya (Indian Logic) from University of Kerala. His ramblings on technological trends and book reviews can be found at http://jaganadhg.in .
1
▼
Hi Jaganadh, this sounds like a tutorial to me.
1
▼
Hi Anand,
The plan is to make it a semi tutorial. More stress will be given to Pydoop and comparing pydoop features with dumbo, disco and mrjob. Some running examples will be demonstrated for Pydoon and other tools. If it fits the criteria of tutorial I am happy to move it to tutorial section.