Visualizing government reports with Python
arjoonn sharma (~theSage21) |
- We talk about how and where the Government is making data publicly available.
- We discuss what it is doing wrong.
- We learn how to use this data to understand what is happening
- We get to use Python
For demonstration we pick Police Crime records and compare them across years and locations.
- The government is continuously releasing data via http://data.gov.in and various other state owned websites.
- Most of the data is scattered and in forms which make analysis difficult. For example in
- We use python to scrape links and save raw data from such sites.
- Selenium, BeautifulSoup, html2text come into play
- We merge all of our different sources into a single coherent dataset
- We get rid of inconsistencies and other blemishes
Pandas comes into play here.
After our dataset is prepared, we begin to analyse it.
- Plotting libraries like Seaborn, Matplotlib come into play here
- Statistical analysis and other techniques are also applied to understand what it is that the government has reported.
- Python offers a way to complete the entire process from start to finish without leaving the comfort of it's grammar
- A large ecosystem of libraries
- Good documentation on most libraries
- Wonderful support through blogs/ IRC/ etc.
- Basic python syntax
Will be made available before talk
I'm a Masters student at IIITM-K currently working on my Masters' Thesis.
I love building things with Python, blogging, writing articles for OSFY and so on.
I've been swimming in Python for about 5 years now, and absolutely love it.