Web Scraping using Selenium Webdriver and Data Analysis with Python
Selenium is widely used for Automation testing only but In this short session we will see how one can pull the data from the web using Selenium Webdriver and then perform exploratory analysis on this data using Python modules Pandas,IPython and matplotlib, We will be scraping the data from open source website IMDB consist of Best pictures won in last 65 years in Filmfare and will see what interesting facts can be revealed from the data.
This entire exercise would be helpful for anyone who wants to understand how data can be pulled with the help of Selenium Webdriver from a website and organized using python libraries for the data analysis. During this session we would be using an open source data for analysis and see how we can draw conclusions using this data.
This Paper is presented in Selenium Conference 2016.
Intro: (5 mins)
- what is the importance of data ?
- How data can be extracted from different sources?
- Tools used by data scientist for extracting, cleaning and analyzing the data
- What is Web Scraping and Why it is required?
Web Scraping & Data Analysis Tools (10 mins)
- Why Selenium web driver when other libs like BeautifulSoup, Scrapy are there?
- Using Selenium Webdriver to extract data from web
- Introduction about Pandas and Matplotlib
Demo (20 mins)
- Extract Data from IMDB web page using Selenium Webdriver
- Arranging the data in a structured format
- Cleaning and Re-shaping the data
- Consuming data by Pandas Dataframe
- Using Pandas functions for data analysis
- Visualization with Matplotlib
- Result and Discovery from Analysis
Q&A (5 mins)
Basic knowledge on Selenium,Python and Data Analysis using Pandas is must
Hi There, I'm Vinay Babu working as a Team Lead in Trimble India Information technology, Have around 10yrs of experience and have worked in multiple roles as a Developer,Tester,Business Analyst. I'm a Technology enthusiast and keeps myself busy most of the day in-out with coding, learning & training. Started my career working with Java and slowly i have transition to Python and currently exploring the Scientific Computing libraries for Data Analysis in python. My work requires me to develop Selenium Framework and Automate the enterprise application which my company is developing. I'm a heavy PY user and spend most of my off office hours exploring the python libraries. When I'm not @ work I will be a Husband & Son and loves spending time with my 2.5 yrs old daughter.