Scraping with Python for Fun and Profit

Talks | Submit a talk
Authors Abhishek Mishra
Level Beginner
Topic Web programming
Tags newbie, web scraping, mechanize, beautifulsoup, urllib2, scrapy
Summary

Tim Berners-Lee - On the Next Web talks about open, linked data. Sweet may the future be, but what if you need the data entangled in the vast web right now?

Mostly inspired from author's work on SpojBackup, this talk familiarizes beginners with the ease and power of web scraping in Python. It would introduce basics of related modules - Mechanize, urllib2, BeautifulSoup, Scrapy, and demonstrate simple examples to get them started with.

Outline
  • What is web scraping all about?
  • Some interesting use-cases
  • Urllib2 - passing your data, getting back results
  • BeautifulSoup - parsing it out of the entangled web
  • Mechanize - programmatic web browsing
  • Scrapy - a web scraping framework
  • Case study - putting it all together
Notes
Profile of the authors

Abhishek is a final year student of Computer Science at Amrita School of Engineering, Bangalore. Last year he gave an ultra-lightening talk on Web2Hunter, a domain name generator. He loves being able to do a lot with it in less lines and time.

He is currently working for Sahana Software Foundation on Sahana Eden as a part of Google Summer of Code.

Blog: http://blog.ideamonk.in/
Code: http://github.com/ideamonk

Files
file size uploaded comment
transcript.txt 68 bytes september 10, 2010 for low bandwidth situations
PyCon2010-Scraping_with_Python_for_Fun_and_Profit__ideamonk.pdf 7.8 MB september 10, 2010 Main presentation slides, media rich / heavy.

You can upload or delete a file if you are author of this talk.