Scraping with Python for Fun and Profit
Authors | Abhishek Mishra |
Level | Beginner |
Topic | Web programming |
Tags | newbie, web scraping, mechanize, beautifulsoup, urllib2, scrapy |
Tim Berners-Lee - On the Next Web talks about open, linked data. Sweet may the future be, but what if you need the data entangled in the vast web right now?
Mostly inspired from author's work on SpojBackup, this talk familiarizes beginners with the ease and power of web scraping in Python. It would introduce basics of related modules - Mechanize, urllib2, BeautifulSoup, Scrapy, and demonstrate simple examples to get them started with.
- What is web scraping all about?
- Some interesting use-cases
- Urllib2 - passing your data, getting back results
- BeautifulSoup - parsing it out of the entangled web
- Mechanize - programmatic web browsing
- Scrapy - a web scraping framework
- Case study - putting it all together
Abhishek is a final year student of Computer Science at Amrita School of Engineering, Bangalore. Last year he gave an ultra-lightening talk on Web2Hunter, a domain name generator. He loves being able to do a lot with it in less lines and time.
He is currently working for Sahana Software Foundation on Sahana Eden as a part of Google Summer of Code.
Blog: http://blog.ideamonk.in/
Code: http://github.com/ideamonk
file | size | uploaded | comment |
---|---|---|---|
transcript.txt | 68 bytes | september 10, 2010 | for low bandwidth situations |
PyCon2010-Scraping_with_Python_for_Fun_and_Profit__ideamonk.pdf | 7.8 MB | september 10, 2010 | Main presentation slides, media rich / heavy. |
You can upload or delete a file if you are author of this talk.