ScrapyQuickStart: Web Scraping Code Generation

Susmit Vengurlekar (~susmitpy)


1

Vote

Description:

Keeping code modular, adhering to good and standard team-defined practices is hard. Setting up manually the various files in the structure required is hard. The tool to be discussed aims to solve this problem when writing web scraping code using scrapy.

The tool (https://github.com/susmitpy/ScrapyExample) generates code in a standard fashion using Jinja template for the item to be scrapped. It does this in a modular way keeping the code separate. The generated code also includes code to make the scraped data available after the execution without writing to any file in between , something which had taken me a lot of time to figure out.

The takeaways for the audience is mentioned in the outline of the talk below.

Outline

  1. Intro to Web Scraping
  2. Beautiful Soup vs Selenium vs Scrapy - What to use when (Takeaway #1)
  3. Code walk-through of a basic generated scrapy code along with execution (Takeaway #2)
  4. Generating the demoed code using Jinja Templates (Takeaway #3 - Inspiration for their own repetitive tasks)
  5. Using Jinja Templates vs Copy-Paste-Modify

Prerequisites:

Basics of python. Considering 25 minutes is long enough to explain clearly the scrapy code, audience need not have prior experience in web scraping.

Content URLs:

  1. https://github.com/susmitpy/ScrapyExample

Speaker Info:

As a Data Scientist and Full Stack Software Developer, Having developed many software products utilising different technologies backed by cost optimised but performant cloud based system architecture, having worked with MySQL, Postgres, Firebase Firestore, MongoDB, Cassandra as well as Graph databases such as Neo4j, I have learned to use the right tool for the right job in the right manner.

https://www.linkedin.com/in/susmit-vengurlekar

Speaker Links:

  • https://github.com/susmitpy/BuildingARecommendationEngineUsingNeo4jAzureBootcampTalk (Gave talk at Global Azure Bootcamp 2024 - Mumbai)
  • https://pypi.org/project/cache-df/
  • https://pypi.org/project/unitgen/
  • https://susmitpy.medium.com/

Section: Other
Type: Talk
Target Audience: Beginner
Last Updated: