An intro to Web Scraping, dos & don'ts and the challenges in Scaling it to huge volumes

rajaemmela | 14 May, 2018

5

Votes

Description:

Introducing to Web Scraping.

A complete walkthrough the below items:

Challenges in scraping websites and parsing the data,
Introducing Scrapy, a widely used framework to extract data
Dos & Don'ts
Usage of Proxies & IP Rotation
Crawling hundreds of websites, running and scaling them to huge volumes

Prerequisites:

Laptop with Ubuntu or a similar OS. Python and MySql latest versions

Basic understanding of Python and MySql Good to have knowledge in writing Xpaths and usage of proxies

Content URLs:

https://atad.xyz [ Will share the GitHub repo during the talk with sample web crawlers ]

Speaker Info:

I am Raja Emmela, I Run Headrun Technologies, Bangalore - helping clients in Data Scraping and Web Applications

We are in this space for the last seven years, extracting data and parsing them. My experience helps do share the challenges we faced with domestic and NA & APAC clients while scraping websites and the don'ts in particular.

Speaker Links:

LinkedIn
- Twitter
- Blog

Section:	Data science
Type:	Talks
Target Audience:	Intermediate
Last Updated:	14 May, 2018

Comments