Making custom crawler using Scrapy and Beautiful Soup

by Charul (speaking)

Section: Software Development Tools
Technical level: Intermediate

Objective

In this session i will discuss about how to develop a custom crawler on the top of Python Scrapy. I will cover the basic steps for getting started with the python scrapy framework and how to write optimized code so that it crawls only things what you need not whole web. Also I will talk about motivation behind using custom crawler and will quote my example i.e. how i built cralwer for financial news articles.

Also I will tell you more about Python beautifulSoup which can ease your work in handling HTML and XML data.

Description

This session will be divided in the following topics:
1. Why we need to have custom cralwers and some succesful stories about using it.
2. How to create custom web crawler on the top of Scrapy Framework?
3. How to write optimized scrapy script?
4. How to pre-process HTML and XML data using BeatifulSoup.

Requirements

Attendees should have Scrapy, Requests and Beautiful Soup installed in their laptops.

Speaker bio

I am currently a third year undergraduate student at IIIT Allahabad. I have been selected as an intern for Google Summer of Code 2014 under fedora organisation. Prior to this, I have worked as intern on a Datagrepper Project under Gnome Outreach Program for Women. I have worked with a few startup companies as freelancer and contributed to various open source projects.

Links

1. Fedora Profile [https://fedoraproject.org/wiki/User:Charul]
2. Github (charulagrl (https://github.com/charulagrl)
3. Blog (http://honeycoding.wordpress.com/)
4. View Gsoc Proposal (https://fedoraproject.org/w/index.php?title=GSOC_2014/Student_Application_charul)

Comments

▲
1
▼

[-][+] Kushal Das 268 days ago

You should add more details in the talk description.

[reply] [link]

Login with Twitter or Google to leave a comment →