Visualizing government reports with Python

arjoonn sharma (~theSage21)


3

Votes

Description:

  • We talk about how and where the Government is making data publicly available.
  • We discuss what it is doing wrong.
  • We learn how to use this data to understand what is happening
  • We get to use Python

For demonstration we pick Police Crime records and compare them across years and locations.

Verbose

Gathering

  • The government is continuously releasing data via http://data.gov.in and various other state owned websites.
  • Most of the data is scattered and in forms which make analysis difficult. For example in pdf files.
  • We use python to scrape links and save raw data from such sites.
  • Selenium, BeautifulSoup, html2text come into play

Consolidation

  • We merge all of our different sources into a single coherent dataset
  • We get rid of inconsistencies and other blemishes
  • Pandas comes into play here.

  • Analysis*

  • After our dataset is prepared, we begin to analyse it.

  • Plotting libraries like Seaborn, Matplotlib come into play here
  • Statistical analysis and other techniques are also applied to understand what it is that the government has reported.

Why python?

  • Python offers a way to complete the entire process from start to finish without leaving the comfort of it's grammar
  • A large ecosystem of libraries
  • Good documentation on most libraries
  • Wonderful support through blogs/ IRC/ etc.

Prerequisites:

  • Basic python syntax

Content URLs:

Will be made available before talk

Speaker Info:

I'm a Masters student at IIITM-K currently working on my Masters' Thesis.
I love building things with Python, blogging, writing articles for OSFY and so on.
I've been swimming in Python for about 5 years now, and absolutely love it.

Section: Data Visualization and Analytics
Type: Talks
Target Audience: Beginner
Last Updated: