Taming Big-Data and Data Plumbing with Python

ROSHAN ZAMEER (~roshan77)


Description:

Data science, analytics, machine learning, big data… All familiar terms in today’s tech headlines, but they can seem daunting, opaque or just simply impossible. Despite their gleam, they are real fields and you can master them! We’ll dive into what Big-data consists of and how we can use Python to solve this problem for us.

Data science is a large field covering everything from data collection, cleaning, standardisation, analysis, visualisation and reporting. Depending on your interests there are many different positions, companies and fields which touch data science. You can use data science to analyse language, recommend videos, or to determine new products from customer or marketing data. Whether it’s for a research field, your business or the company you work for, there’s many opportunities to use data science and analysis to solve your problems.

When we talk about using big data in data science, we are talking about large scale data science. What “big” is depends a bit on who you ask. Most projects or questions you’d like to answer don’t require big data, since the dataset is small enough to be downloaded and parsed on your computer. Most big data problems arise out of data that can’t be held on one computer. If you have large data requiring several (or more) computers to store, you can benefit from big data parsing libraries and analytics.

So what does Python have to do with it? Python has emerged over the past few years as a leader in data science programming. While there are still plenty of folks using R, SPSS, Julia or several other popular languages, Python’s growing popularity in the field is evident in the growth of its data science libraries. Let’s take a look at a few of the tools and build a pipeline to tackle a real-world challenge.

Prerequisites:

Basics in Python.

Content URLs:

https://github.com/roshanzameer/TwitterSentimentAnalysis

Speaker Info:

Roshan Zameer is a graduate in Electronics and Communication Engineering, a Python and Big-Data enthusiast, based in Bangalore. He has over 4 years of experience in solving the challenges of Big-Data. He is currently working as a Data Engineer at Edge Networks, an AI based HR and workforce optimisation company where he is building the Data Pipeline and Machine Learning Platform. Previously, he has worked as a Data Engineer at Euromonitor International, a UK based Market Research and Intelligence company where he has worked on a Data Pipeline, enabling Data Discovery, Forecast, Prediction and Analytics.

On weekends, he speaks at meetups and seminars, hangs out with his college buddies, plays guitar and Snooker.

Speaker Links:

https://www.linkedin.com/in/roshan-zameer/

Section: Data Science, Machine Learning and AI
Type: Community-poster
Target Audience: Intermediate
Last Updated: