Looking beyond threads and blocking sockets: Dhaga

by Prahlad Nishal (speaking)

Section: Core Python
Technical level: Intermediate

Objective

The talk will focus on real world challenges of scaling and performance of a network-intensive application and how we can overcome these without changing the existing code flow. The solution we propose is Dhaga, which is our abstraction for a lightweight thread of execution based on Greenlets and Tornado. This lets us perform asynchronous network IO with sequential code flow and scale to thousands of sockets or Dhagas.

Description

Most of the network-intensive applications use threads with blocking sockets for network operations. By using blocking sockets, the programming code flow is simple and sequential.
When the applications needs enhanced performance or scale out, we increase the number of threads. But when the application needs thousands of sockets or threads, the normal practice is to use either an asynchronous socket library or multiple processes, both of these require enough code changes and break the sequential code flow.

To address this problem, we propose Dhaga (meaning thread in Hindi language), which is our abstraction for a lightweight thread of execution. Dhaga class is derived from Greenlet, which implements stack switching to execute multiple code flows in one OS thread. An OS level thread executes multiple Dhagas with cooperative scheduling. Whenever any Dhaga needs to perform network IO, it assigns the work to Tornado, hands over the control to Greenlet, and waits for its turn to continue execution.

Greenlets are micro-threads with no implicit scheduling. This is useful when you want to control exactly when your code runs. You can build custom scheduled micro-threads on top of Greenlet. We use Greenlets with cooperative scheduling with our custom scheduler.

Tornado is a simple, non-blocking web server framework written in Python, designed to handle
thousands of asynchronous requests. We use its core components IOLoop and IOStream.

We have successfully shipped Dhaga in our product, and it is actively used in our production servers.

Example of the improved performance of backup application using Dhaga:

Data Size: 1GB (32K files)

Time with threads(32 threads) : 44 mins

Time with Dhaga(4 threads*32dhagas): 14 mins

Data Size: 2.2GB (Windows Program Files)

Time with threads(32 threads): 37 mins

Time with Dhaga(4 threads*32dhagas): 15 mins

In addition, we are able to improve our webserver to process 10K concurrent requests without changing application level code, just replaced threads with dhaga.

In my talk, I will be explaining the following concepts

Scaling and performance challenges of network applications using threads and blocking sockets, for example, a threaded web server.
Introducing Dhaga, its architecture, and how it can be used.</li>
Replacing threads with dhaga with minimal code changes to improve performance and scale out.

Requirements

Attendees should have a basic knowledge of:

Python 2.7x
Sockets
Threads

Speaker bio

Prahlad Nishal is a Senior Software Engineer working at Druva for the last 4 years, primarily involved in networking code. Python is primary language for most of his projects.

Links

Comments

▲
1
▼

[-][+] Kushal Das 269 days ago

Can you please post the links to the codebase?

[reply] [link]
▲
1
▼

[-][+] Aravind Krishnaswamy 229 days ago

I echo Kushal - is this open source? Please provide a link to the code. It would also be great if you can provide links to videos or slides of past speaker sessions, that would help people decide if they would like to attend.

[reply] [link]
▲
1
▼

[-][+] Prahlad Nishal 228 days ago

As I've discussed earlier that we have been actively using Dhaga in our production servers, right now this is embedded in the product but we are still exploring how to open source it in the future. We will cover implementation details during the presentation like class diagrams, class details, call flows, code snippets etc.

[reply] [link]

Login with Twitter or Google to leave a comment →