Visualizing government reports with Python
arjoonn sharma (~theSage21) |
3
Votes
Description:
- We talk about how and where the Government is making data publicly available.
- We discuss what it is doing wrong.
- We learn how to use this data to understand what is happening
- We get to use Python
For demonstration we pick Police Crime records and compare them across years and locations.
Verbose
Gathering
- The government is continuously releasing data via http://data.gov.in and various other state owned websites.
- Most of the data is scattered and in forms which make analysis difficult. For example in
pdf
files. - We use python to scrape links and save raw data from such sites.
- Selenium, BeautifulSoup, html2text come into play
Consolidation
- We merge all of our different sources into a single coherent dataset
- We get rid of inconsistencies and other blemishes
Pandas comes into play here.
Analysis*
After our dataset is prepared, we begin to analyse it.
- Plotting libraries like Seaborn, Matplotlib come into play here
- Statistical analysis and other techniques are also applied to understand what it is that the government has reported.
Why python?
- Python offers a way to complete the entire process from start to finish without leaving the comfort of it's grammar
- A large ecosystem of libraries
- Good documentation on most libraries
- Wonderful support through blogs/ IRC/ etc.
Prerequisites:
- Basic python syntax
Content URLs:
Will be made available before talk
Speaker Info:
I'm a Masters student at IIITM-K currently working on my Masters' Thesis.
I love building things with Python, blogging, writing articles for OSFY and so on.
I've been swimming in Python for about 5 years now, and absolutely love it.