Advanced Data Analysis using Pandas/Numpy

Anant Gupta (~anant79)


Description:

Everyone is familiar with Pandas and Numpy. It is the Hello World of Data Science in Python However, in the fancy world of Machine Learning and Artificial Intelligence, we often tend to overlook the modest data structures like Pandas

In this workshop, we will go through the following

  • Internal Structure of Pandas/Numpy/Lists
    • Understanding numpy implementation from GITHUB ( 15 minutes )
      • https://github.com/numpy/numpy/tree/master/numpy/core
    • Understanding pandas implementation from GITHUB ( 15 minutes )
      • https://github.com/pandas-dev/pandas/blob/master/pandas/core/frame.py
    • Understanding list implementation from GITHUB ( 10 minutes )
      • https://github.com/python/cpython/blob/master/Objects/listobject.c
    • Operations performed on Pandas/Numpy ( 20 minutes )
      • Numpy operations
      • Slicing/Dicing as part of data pre-processing
      • Should we save numpy arrays? Realtime Server applications
      • Numpy arrays in multithreaded/multiprocess modules
      • Some very useful but less known numpy functions
    • Pandas Operations ( 30 minutes )
      • Joins
      • Filters and PreProcessing ( including datetime operations )
      • Operations on slices of data
      • Is map better or apply or none? Confirm with a simple use case
      • Storage efficiency for large pandas dataframes ( Realtime Server applications )
    • List Operations ( 15 minutes )
      • List comprehension
      • Power of lists over numpy
      • Where should generators be used https://github.com/python/cpython/blob/master/Lib/email/generator.py
  • Cython ( 20 minutes )
    • What is cython
    • Examples of cython and how to use it on a daily basis
  • End to End use case ( 25 minutes )
    • Write your own custom fast scalable K-Means Aglgorithm

At the end of the workshop, everybody will learn to cut their pre processing time in python by a huge amount

Prerequisites:

  1. Basic Python
    1. Experience in data analysis in any language

Speaker Info:

My name is Anant Gupta and I have an experience of 8.5 years in the industry. I have worked on several technologies ( some of them are no longer used in the market :) ) My tryst with Python began 4 years back and I have never looked sideways since then

Currently I work as a Data Scientist in Ericsson, and the talk is an amalgamation of problem statement that I used to face as part of my work

Speaker Links:

  1. https://github.com/anantguptadbl/python : My repository for all python stuff
    1. https://hasgeek.tv/fifthelephant/2018-day-2/1576-deep-portfolio-using-neural-networks-for-portfolio-construction-anant-gupta : My talk on using Neural Networks in the world of Finance

Section: Data Science, Machine Learning and AI
Type: Workshop
Target Audience: Beginner
Last Updated: