Advanced Data Analysis using Pandas/Numpy
Anant Gupta (~anant79) |
Description:
Everyone is familiar with Pandas and Numpy. It is the Hello World of Data Science in Python However, in the fancy world of Machine Learning and Artificial Intelligence, we often tend to overlook the modest data structures like Pandas
In this workshop, we will go through the following
- Internal Structure of Pandas/Numpy/Lists
- Understanding numpy implementation from GITHUB ( 15 minutes )
- https://github.com/numpy/numpy/tree/master/numpy/core
- Understanding pandas implementation from GITHUB ( 15 minutes )
- https://github.com/pandas-dev/pandas/blob/master/pandas/core/frame.py
- Understanding list implementation from GITHUB ( 10 minutes )
- https://github.com/python/cpython/blob/master/Objects/listobject.c
- Operations performed on Pandas/Numpy ( 20 minutes )
- Numpy operations
- Slicing/Dicing as part of data pre-processing
- Should we save numpy arrays? Realtime Server applications
- Numpy arrays in multithreaded/multiprocess modules
- Some very useful but less known numpy functions
- Pandas Operations ( 30 minutes )
- Joins
- Filters and PreProcessing ( including datetime operations )
- Operations on slices of data
- Is map better or apply or none? Confirm with a simple use case
- Storage efficiency for large pandas dataframes ( Realtime Server applications )
- List Operations ( 15 minutes )
- List comprehension
- Power of lists over numpy
- Where should generators be used https://github.com/python/cpython/blob/master/Lib/email/generator.py
- Understanding numpy implementation from GITHUB ( 15 minutes )
- Cython ( 20 minutes )
- What is cython
- Examples of cython and how to use it on a daily basis
- End to End use case ( 25 minutes )
- Write your own custom fast scalable K-Means Aglgorithm
At the end of the workshop, everybody will learn to cut their pre processing time in python by a huge amount
Prerequisites:
- Basic Python
- Experience in data analysis in any language
Speaker Info:
My name is Anant Gupta and I have an experience of 8.5 years in the industry. I have worked on several technologies ( some of them are no longer used in the market :) ) My tryst with Python began 4 years back and I have never looked sideways since then
Currently I work as a Data Scientist in Ericsson, and the talk is an amalgamation of problem statement that I used to face as part of my work
Speaker Links:
- https://github.com/anantguptadbl/python : My repository for all python stuff
- https://hasgeek.tv/fifthelephant/2018-day-2/1576-deep-portfolio-using-neural-networks-for-portfolio-construction-anant-gupta : My talk on using Neural Networks in the world of Finance