Scaling Python/Django to millions of unique visitors and page-views (HackerEarth)
by Vivek Prakash (speaking)
Objective
The audience will get to learn about what it takes to build a highly fault-tolerant and distributed web application which can tolerate high load, offers minimal latency and is surprisingly fast. I will dive into the discussion of such an architecture including optimizations at application level (Python/Django), database (MySQL), Apache, rabbitmq, celery, memcached, Tornado and HAProxy.
Description
Scaling up a web application is one of the hardest part while handling huge traffic and load bursts. At HackerEarth, we are big fan of Python & Django. We have served up to million unique visitors and tens of millions of pageviews in just over an year, including our Code-Evaluation servers which has served over three million requests. I will talk about how to scale up fast and without failing most of the times.
Python/Django
There are so many lesser-known optimizations that can be done at the application level, including steps to reduce queries, writing code which takes O(1) time irrespective of the size of data, and managing the process memory efficiently. I will talk about some of them in detail.
Apache web-server
we use Apache with mod_wsgi for hosting the application. There is a general complaint that Apache sucks when it comes to hosting Python web applications. It’s said that it’s slow, bloated, uses lots of memory and doesn’t perform very well. It’s also said that it doesn’t handle a high number of concurrent requests.
All that is true if you are not running the Python application in the right way. If configured properly, Apache works fantastically and is usually never the reason of slowness. That is almost always due to the application bottlenecks and database latency. I will talk about how to setup mod_wsgi and Apache properly to handle traffic bursts.
MySQL (Database)
With the growing data and number of requests/sec, it turns out that the database becomes the major bottlneck to scale the application dynamically. At this point if you are thinking that there are mythical (cloud) providers who can handle the growing need of your application, you can’t be more wrong. To make the problem even harder, you can’t spin a new database whenever you want to just like your frontend servers. To achieve a horizontal scalability at all levels, it requires massive rearchitecture of the system while being completely transparent to the end user. I will talk about how to scale up the database beautifully.
I will cover other components like Memcached, celery & rabbitmq in brief detail, mostly focusing on how to leverage them for faster response time and using the concept of ‘deferred processing’ for most of the tasks.
Speaker bio
I am the CTO & Co-founder of HackerEarth, a platform for developers to solve programming challenges and connect with each other. I started building HackerEarth about an year back which has quickly become a large production environment, throwing challenges everyday to smoothly scale it up. I will share the experiences and learning I had while building it, some of them very little known in the open but most important to build any good product.
I am 23 years old, and graduated in 2013 from IIT Roorkee.
0
▼
Vivek, this sounds interesting, but there are other Django proposals already in the funnel. One of them is from Arun, a known Django practitioner and past speaker. It might be useful to provide your slide deck or more detail on what some your learnings were that you plan to cover. Some of the topics you've mentioned have already been covered before last year by Arun. Also, please separate content of your talk from a pitch about your company (incl the title).
1
▼
Hi Aravind, somehow I missed your comment among all. The talk is not a pitch about the company, I just added the name to add some credibility to the talk. Do you still want me to provide more details other than that already mentioned in the talk description?
1
▼
Vivek, either Krace or I will try to call you and give you a bit more content. The gist of it is that we feel this proposal is too broad to do enough justice to specific learnings. Plus, there is already another Django proposal that was confirmed.