Rewriting the Wayback machine's live web proxy in Python
by Noufal Ibrahim (speaking)
- Web Development
- Session type
- Technical level
This talk will discuss the development and design of a high performance web app which successfully replaced a decade old existing service without any hiccups.
The wayback machine is a high traffic website that has been online for over a decade. It was a mostly Java application. One component of the application is the Liveweb proxy. This is an HTTP proxy that archives a resource which is requested through it and the core data source for the wayback machine.
The liveweb proxy was rearchitected from scratch in Python and deployed on the actual website and has been running for a few months now without a single hitch. There were limitations in the standard library which needed to be worked around, careful tuning of parameters to balance disk I/O and memory usage, fine details of the HTTP protocol that needed to be understood and respected.
This talk discusses the architecture and design of the new system to handle the kind of traffic and patterns which are expected of an archiving proxy and how it was deployed.
Anand is a software consultant and trainer. He has been working with the Archive since 2007. He is co-ordinator of the PyCon India 2012 conference.
Noufal is a freelance trainer and consultant based out of Bangalore. Founder of PyCon India and organiser of the first two conferences in India.