Messing with government data using Python
by Anand Chitipothu (speaking)
- Software Development Tools
- Technical level
During this election season, I've spent lot of time scrapping government data, parsing PDFs to extract lot of useful data that was not available anywhere else. I've used Python extensively in this process.
This talk is about my learnings during the process and tips of others who are about to take similar adventures.
During this election season, I volunteered to provide technical assistance to couple of election campaigns. During this process, I found that lot of crucial information is very hard to find. To give you an idea, here are some of them:
- all assembly constituencies in a parliamentary constituency
- all wards in an assembly constituency
- all polling booths in an assembly constituency
- number of polling centers
I've also built couple of tools using this data to help the campaign.
Apart from these, close to the elections I had to build a website to find polling booth from voterid, knowing that election commission website is not going to work when need. Again this information is available only in the PDFs and had to be extracted using Python.
And there were some people keen to use voterlists on paper. So I wrote some python scripts to sort voters of a polling center by name/house-no etc. and generate PDFs in very compact form so that they take very less number of papers. Imagine printing voterlists for all people in Bangalore. I'm sure I would have saved paper equivalent to a dozen trees.
I've used reportlab for generating PDFs and got hit by a performance issue in reportlab. It was just a day before elections and I had to finish the task. With no options left, I took couple of servers on the cloud and ran thing in parallel. I found a work-around only after burning my hands.
There were lot more adventures. Come and listen to me if you interested!
Anand Chitipothu is a software consultant and trainer. He offers corporate trainings on Python and conducts public courses on Python programming in Bangalore.
He is an active member of Indian Python community, coordinator of PyCon India 2012, organized PythonMonth in 2013 and elected member of PSF.