Objective
An insight into analyzing and visualizing a dataset using various libraries of Python, and also get the audience started in NLP, using NLTK.
After the talk, the audience would have a good understanding about using Python as an effective Data Science tool.
Description
Data Science have become a buzz-word in the recent days. Python have emerged as a very powerful tool for Data Scientists.
The main motive behind this session is to get people started at using Python for analyzing datasets, right from munging and wrangling a dataset to developing and deploying some cool Machine Learning algorithms.
At the end of the session, the participants would get well acquainted with the libraries essential for Data Science practice and also with scraping and analyzing data using API's, databases, and directly from data available in the local system, and also get started at nltk and try to make sense of text snippets and try to understand the magic of search engines and complexity of the NLP world.
Requirements
A laptop with all the libraries installed.
All the necessary libraries can be installed at once, from the Anaconda package from Continuum Analytics.
The pandasql is not available in the Anacondas package, so please install it separately.
Some alternatives to the Anacondas package: Python(x,y) , Canopy.
It would be better if the participants get their IDE's configured for the Anaconda package. The steps are available in the Anacondas documentation.
Speaker bio
I am a 20 year old undergrad at IIT Jodhpur. Being a passionate Data Scientist, I have been practicing Data Science and also pursuing active research with a research paper to my credit.
I have given talks on Data Science and Data metrics as part of the Mozilla Team. So, I am well experienced with providing a learner-friendly environment and an interesting and engaging talk for beginners at Data Science, and help them take home the confidence of getting started and going at the skill.
Being an undergraduate student, I would be very compatible with the audience, in helping them understand the concepts from the root-level, having known it's importance being an active learner myself.
LinkedIn profile: http://in.linkedin.com/in/jalemrajrohit
Personal Website: http://dawny33.github.io/
1
▼
Hi Jalem,
Can you share how you are planning to structure your talk for 45 mts.
A rough flow would help us understand the various topics you will touch as DataScience is a very broad topic.
Regards
Konark
1
▼
Hii Konark,
As this talk is intended for beginners, it is for giving a flavour about how Python can be used as a versatile tool for most of the domains of data science.
First 15 minutes, it would be data retrieval (from twitter or cricbuzz, I'd be using tweepy for that), data munging (cricbuzz data has numeric data too, so munging would make perfect sense), data analysis using pandas and numpy (simple statistical analyses).
Then, the next 15 minutes would be for ML algorithm demonstration in Scikit-learn and statsmodels. The munged data can be used for this session.
The last 15 minutes would be a basic introduction to NLP. As most of the audience would be relatively new to the NLP lingo, some basic analyses on either the mined twitter data or available corpuses would be done.