Achieving true parallelism in Python: the past, present & future of parallel code in Python

Rishi Raj (~RishiRaj22)




This talk is an adaption of my article:

It aims at answering the following questions: -

  • Have you ever used a blazing fast Python library and wondered why can't your code run that fast? What do these libraries do under-the-hood?
  • What are the limitations of multi-tasking in Python? How can I get rid of these limitations and utilize processors more effectively using Python?
  • Are there any new Python features that can help me write code to better utilize processors in Python?

Through this talk, we will de-mystify various general multi-tasking related concepts (process, thread, synchronization, context switch, etc.), and cover the limitations of these paradigms in Python. In the latter half of the talk, we’ll explore existing mechanisms to achieve parallelism in Python code. Finally, we’ll build a generic solution to achieve true parallelism for Python code running in a single CPython process using new Python 3.12 features.

Agenda of the talk: -

  1. I’ll start with covering general concurrency and parallelism concepts: -
    1. What is concurrency & parallelism?
    2. What is multi-threading & multi-processing?
    3. Pitfalls with multi-threading (context switches, synchronization troubles, lock contention, etc.)
    4. Co-routines
  2. Then, we'll cover commonly used multi-tasking paradigms in Python: -
    1. Multi-threading in Python
      1. How does it work?
      2. Limitations with threads in Python (Global Interpreter Lock & how Python interpreter interacts with it)
    2. Concurrency using threads & co-routines
    3. Parallelism using multiple processes
  3. Then, we will look at ways to achieve true parallelism within a single Python process: -
    1. How libraries like Numpy achieve this?
    2. How can you achieve the same? (Demo using CPython extensions)
  4. Finally, we will build a generic solution to achieve true parallelism for pure Python code in Python 3.12 using sub-interpreters
    1. Design of the generic solution
    2. Using & benchmarking the solution
    3. Limitations & pitfalls with sub-interpreters
    4. Future of sub-interpreters (Multiple Interpreters in the Stdlib)
  5. Conclusion
    1. Recap of ways to achieve concurrency & parallelism.
    2. Q&A

[Note] Since this talk requires covers C++ code, which might not be familiar to most folks, so, instead of live-coding, I’ll try to explain the C++ code through design diagrams and words, as much as possible. If someone is interested in actual implementation details, they can directly check the linked (source code).

[Audience takeaway] I believe this talk would be helpful for Python developers at all experience levels in better understanding and appreciating the advantages, disadvantages & loopholes associated with running things concurrently & parallelly in Python. For more experienced folks, this would give them a sense of paradigm shifts happening with newer versions of Python. I hope folks apply this knowledge in their day-to-day lives to achieve ideal concurrency/parallelism results using Python.


  • Basics of Python

Speaker Info:

Personal details:

Name: Rishi Raj


Organization: D.E. Shaw

Designation: Senior Member Technical

Experience: ~4 years


Over the past few years, I've extensively developed software at the intersection of Python and C++, leveraging Python for dynamic business logic and C++ for core computational tasks. Such a setup ensures high performance through compiled C++ code while offering flexibility and easy prototyping with Python. While I've presented technical talks within my current organization, this marks my debut in applying for an external tech talk. I'm eager to share my insights with the broader Python community.

Speaker Links:

Section: Core Python
Type: Talk
Target Audience: Intermediate
Last Updated: