An intro to Web Scraping, dos & don'ts and the challenges in Scaling it to huge volumes
rajaemmela |
Description:
Introducing to Web Scraping.
A complete walkthrough the below items:
- Challenges in scraping websites and parsing the data,
- Introducing Scrapy, a widely used framework to extract data
- Dos & Don'ts
- Usage of Proxies & IP Rotation
- Crawling hundreds of websites, running and scaling them to huge volumes
Prerequisites:
Laptop with Ubuntu or a similar OS. Python and MySql latest versions
Basic understanding of Python and MySql Good to have knowledge in writing Xpaths and usage of proxies
Content URLs:
https://atad.xyz [ Will share the GitHub repo during the talk with sample web crawlers ]
Speaker Info:
I am Raja Emmela, I Run Headrun Technologies, Bangalore - helping clients in Data Scraping and Web Applications
We are in this space for the last seven years, extracting data and parsing them. My experience helps do share the challenges we faced with domestic and NA & APAC clients while scraping websites and the don'ts in particular.